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MOLECULAR JYPjNG OF GROU P B STREPTOCOCCI 

pjpiri nf the invention 

The present invention relates to molecular methods of typing group B 
5 streptococci, as well as polynucleotides useful in such methods. 

Rari<around to the invention 

Group B streptococcus (GBS) - Streptococcus agalactiae - is the 
commonest cause of neonatal and obstetric sepsis and an increasingly important 
10 cause of septicaemia in the elderly and immunocompromised patients. The 
incidence of neonatal GBS sepsis has been reduced in recent years by the use of 
intrapartum antibiotic prophylaxis, but there are many problems with this 
approach. In future, vaccination is likely to be preferred and there has been 
considerable progress in development of conjugate polysaccharide GBS 

15 vaccines. . ... 

Before the introduction of conjugate vaccines, extensive epidemiological 
and other related studies will be required to assess, not only the burden of 
disease but also the distribution of GBS types (including capsular polysaccharide 
gene serotypes, serosubtypes; protein antigen gene subtypes; mobile genefc 
20 element subtypes) to determine the optimal formulation of vaccine antigens. Type 
distribution based on one geographic location or small numbers of patients may 
not be generally applicable. Continued monitoring will be necessary to assess the 
suitability of combinations of GBS vaccine antigens for different target 
populations in different geographic locations. 
25 Nine capsular polysaccharide GBS serotypes have been descnbed 

(Harrison et al., 1998; Hickman et al., 1999). Various serotyping methods have 
been used, including immune-precipitation (Wilkinson and Moody, 1969), enzyme 
immunoassay (Holm and Hakansson, 1988), coagglutination (Hakansson et al 
1992) counter-immunoelectrophoresis, and capillary precipitation (Tr.scott and 
30 Davie's, 1979), latex agglutination (Zuerlein et al., 1991), fluorescence microscopy 
(Cropp et al 1974) and inhibition-ELISA (Arakere et al., 1999). These methods 
are labour-intensive and require high-titered serotype-specif.c antisera, which are 
expensive and difficult to make and commercially available for only six serotypes 
la to V (Arakere et al., 1999). Molecular genotyping methods, such as pulsed- 
35 field gel electrophoresis (Rolland et al., 1999), restriction endonuclease analysis 
(Nagano et al., 1991) are useful for epidemiological studies but do not generally 
identify serotypes. Consequently, there is a need for a reliable molecular method 
for GBS serotype identification. 
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Summary of the invention 

We have identified specific regions within the genome of group B 
streptococci of inter-type sequence heterogeneity that can be used to distinguish 
5 different types (including capsular polysaccharide gene serotypes and 
serosubtypes; protein antigen gene subtypes; and mobile genetic element 
subtypes). We have shown that molecular methods that detect these sequence 
heterogeneities can be used to accurately distinguish and type group B 
streptococci. 

Accordingly in a first aspect the present invention provides a method of 
typing a group B streptococcal bacterium which method comprises analysing the 
nucleotide sequence of one or more regions within the cpsD, cpsE, cpsF, cpsG, 
cpsl/M genes of said bacterium, said region(s) comprising one or more 
nucleotides whose sequence varies between types. 

In particular, the nucleotide sequence may be analysed for one or more 
positions corresponding to positions 62, 78-86, 138, 139, 144, 198, 204, 211, 
281, 240, 249, 300, 321, 419, 429, 437, 457, 466, 486, 602, 606, 627, 636, 645, 
803, 971, 1026, 1044, 1173, 1194, 1251, 1278, 1413, 1495, 1500, 1501, 1512, 
1518, 1527, 1595, 1611, 1620, 1627, 1629, 1655, 1832, 1856, 1866, 1871, 1892, 
1971, 2026, 2088, 2134, 2187 and 2196 as shown in Figure 1. 

10 Preferably at least one region is within a sequence delineated by the 3' 

136 bases of the cpsE gene and the 5' 218 bases of the cpsG gene of the cpsE- 
cpsF-cspG gene cluster of said group B streptococcal bacterium. In particular, 
the nucleotide sequence may be analysed for one or more positions 
corresponding to positions 1413, 1495, 1500, 1501, 1512, 1518, 1527, 1595, 

15 1611, 1620, 1627, 1629, 1655, 1832, 1856, 1866, 1871, 1892, 1971, 2026, 2088, 
2134, 2187 and 2196 as shown in Figure 1. 

In one embodiment, at least one region is within the cpsl/M genes of said 
group B streptococcal bacterium. 

We have also shown that a number of surface protein antigen genes, 

20 including rib, alp2 or alp3 genes, and five mobile genetic elements may be used 
to molecular subtype GBS. Accordingly, the present invention also provides a 
method of typing a group B streptococcal bacterium which method comprises 
determining the presence or absence in the genome ol said bacterium of one or 
more surface protein antigen genes selected from a rib, alp2 or a/p3 gene, and/or 

25 one or more mobile genetic elements selected from IS867, \S1548, \S1381, 
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ISSa4 and GBS11. Preferably, such as method is combined with the above 

methods of the invention. 

The nucleotide sequence analysis step may comprise sequencing said 

one or more regions. Alternatively, or in addition, the nucleotide sequence 
5 analysis step may comprises determining whether a polynucleotide obtained from 

said bacterium selectively hybridises to a polynucleotide probe comprising one or 

more of the said regions, preferably to one or more of a plurality of polynucleot.de 

probes corresponding to one or more of the said regions. 

In a preferred embodiment, where hybridisation to a plurality of probes is 
10 used as a means of analysis, the plurality of polynucleotide probes are present as 

a microarray. 

In another embodiment, the nucleotide sequence analysis step comprises 
. an amplification step using one or more primers, at least one of which hybrid.se 
specifically to a sequence which differs between types. Typically, primer pairs 
15 are used, at least one of which hybridise specifically to a sequence which differs 
between types. Preferably, said primers are selected from the primers shown in 
Table 2 and/or Table 6 and/or Table 10. 

In a second aspect, the present invention provides a polynucleot.de 
consisting essentially of at least 10 contiguous nucleotides corresponds to a 
20 region within a cpsD-cpsE-cpsF-cpsG gene of a group B streptococcal bacter.um, 
said polynucleotide comprising one or more nucleotides which differ between 
GBS types. 

Preferably the nucleotides which differ between GBS types correspond to 
one or more of positions 62, 78-86, 138, 139, 144, 198, 204, 211, 281, 240 249, 
300 321 419 429 437, 457, 466, 486, 602, 606, 627, 636, 645, 803, 971, 1026, 
1044 1173 1194, 1251, 1278, 1413, 1495, 1500, 1501, 1512, 1518, 1527, 1595, 
161l! 1620', 1627, 1629, 1655, 1832, 1856, 1866, 1871, 1892, 1971, 2026, 2088, 
2134 2187 and 2196 as shown in Figure 1. 

' The present invention also provides a polynucleotide consisting essent.ally 
of at least 10 contiguous nucleotides corresponding to a region within a sequence 
25 delineated by the 3 136 base pairs of cpsE and the 5' 218 base pairs of cpsG of 
the cpsE-cpsF-cspG gene cluster of a group B streptococcal bacterium, said 
polynucleotide comprising one or more nucleotides which differ between GBS 

types. , 
Preferably the nucleotides which differ between group B streptococcal 

30 types correspond to one or more of positions 1413, 1495, 1500, 1501 1512, 

1518, 1527, 1595, 1611, 1620, 1627, 1629, 1655, 1832, 1856, 1866, 1871, 1892, 

1971*. 2026, 2088, 2134, 2187 and 2196 as shown in Figure 1. 
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The present invention also provides a polynucleotide consisting essentially 
of at least 10 contiguous nucleotides corresponding to a region within a cpsl/M 
gene of a group B streptococcal bacterium, said polynucleotide comprising one or 
more nucleotides which differ between group B streptococcal types. 

Preferably the polynucleotide is selected from the nucleotide sequences 
shown in Table 2. 

The present invention further provides a polynucleotide consisting 
essentially of at least 10 contiguous nucleotides corresponding to a region within 
a rib, alp2 oralp3 gene of a group B streptococcal bacterium, said polynucleotide 
comprising one or more nucleotides which differ between GBS protein antigen 
gene subtypes. 

Preferably the polynucleotide is selected from the nucleotide sequences 
shown in Table 6. 

The present invention further provides a polynucleotide consisting 
essentially of at least 10 contiguous nucleotides corresponding to a region within 
\S861, \S1548, \S1381, ISSa4 and/or GBSil of a group B streptococcal 
bacterium, said polynucleotide comprising one or more nucleotides which differ 
between GBS mobile genetic element subtypes. 

Preferably the polynucleotide is selected from the nucleotide sequences 
shown in Table 10. 

The polynucleotides of the invention may be used in a method of typing, 
such as serotyping and/or subtyping, a group B streptococcal bacterium. 

In a third aspect the present invention provides a composition comprising a 
plurality of polynucleotides of the second aspect of the invention. The 
composition may be used in a method of typing, such as serotyping and/or 
subtyping, a group B streptococcal bacterium. 

In a fourth aspect the present invention provides a microarray comprising a 
plurality of polynucleotides according to the second aspect of the invention. The 
microarray may be used in a method of typing, such as serotyping and/or 
subtyping, a group B streptococcal bacterium. 

Detailed description of the invention 

Unless defined otherwise, all technical and scientific terms used herein 
have the same meaning as commonly understood by one of ordinary skill in the 
art (e.g., in cell culture, molecular genetics, nucleic acid chemistry,, hybridization 
techniques and biochemistry). Standard techniques are used for molecular, 
genetic and biochemical methods (see generally, Sambrook et a/., Molecular 
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Cloning: A Laboratory Manual, 3 rd ed. (2001) Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor, N.Y. and Ausubel et at., Short Protocols in Molecular 
Biology (1999) 4 th Ed, John Wiley & Sons, Inc. - and the full version entitled 
Current Protocols in Molecular Biology, which are incorporated herein by 
5 reference) and chemical methods. 

The molecular typing methods of the present invention rely on detecting 
the presence in sample of specific polynucleotide sequences in regions of the 
genome of group B streptococci (GBS) that we have identified as varying 

between different types. 

10 More specifically, the specific polynucleotide sequences that are to be 

detected lie within cpsD, cpsE, cpsF, cpsG, cpsl, cpsM, rib, alp2 and/or alp3 
genes of GBS as well as mobile genetic elements \S861, \S1548 and \S1381, 
ISSa4 and GBSil, preferably the cpsD, cpsE, cpsF, cpsG and/or cpsl/M genes. 
Regions of interest within those genes mentioned are regions whose 

15 sequence varies between two or more types, i.e. are heterogenous. 
Heterogeneity may be due to insertions, deletions and/or substitutions between 
corresponding regions in different types. In the case of rib, alp2 and alp3, 
heterogeneity typically takes the form of the presence or absence of the ent.re 
gene Similarly for elements \S861, \S1548, \S1381, ISSa4 and GBSH 

20 heterogeneity typically takes the form of the presence or absence of the entire 

sequence. . . 

Specific regions of heterogeneity include the following pos.t.ons w.thm 
cpsD gene- 62 and 78-86; cpsD-cpsE gene spacer - 138, 139 and 144; cpsE 
gene - 198, 204, 211, 281, 240, 249, 300, 321, 419, 429, 437, 457, 466, 486, 
25 602, 606, 627, 636, 645, 803, 971, 1026, 1044, 1173, 1194, 1251, 1278, 1413, 
1495 1500, 1501, 1512, 1518 and 1527; cpsF gene - 1595, 1611, 1620, 1627, 
1629 1655', 1832, 1856, 1866, 1871, 1892 and 1971; and cpsG gene - 2026, 
2088', 2134, 2187 and 2196 (numbering corresponds to numbering shown in 

30 F ' 9Ure Particularly preferred positions of interest are those that lie within a 790 bp 
fragment of cpsE-cps-F-cpsG (which consists of approximately the 3' 136 bases 
of cpsE to the 5 1 218 bases of cpsG), namely positions 1413, 1495, 1500, 1501. 
1512, 1518, 1527, 1595, 1611, 1620, 1627, 1629, 1655, 1832, 1856, 1866, 1871. 
1892 1971 2026, 2088, 2134, 2187 and 2196 as shown in Figure 1. 

35 " ' Another region of heterogeneity is position 62 of cpsD and a repet.tive 
sequence (TTACGGCGA) found at positions 78 to 86 of cpsD in some but not all 
GBS serotypes. 
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Specific regions of heterogeneity also include a number of positions within 
the cpsl/M gene as shown in the sequence alignment depicted in Figure 3. 

These regions of heterogeneity may be analysed using a variety of means 
including sequencing, PCR and binding of labelled probes. 
5 In the case of sequencing to identify serotype, the sequencing primers are 

selected such that they hybridise specifically to a region within or near to a region 
within which a region of heterogeneity is present. The primers need not be 
specific to particular serotypes since the actual sequence information obtained 
during the sequencing process which is used to assign molecular serotype. Thus 

10 the primers may hybridise specifically to all GBS serotypes (at least serotypes la 
to VII), or to specific serotypes. 

Preferred primers anneal within 100, 50 or 20 contigous nucleotides of a 
heterogeneous position within the 790 bp region of cpsE-cpsF-cpsG shown in 
Figure 1. Examples of suitable sequencing primers are shown in Table 2 

15 (cpsES3, cpsFA, cpsFS, cpsGA and cpsGM). 

PCR and other specific hybridisation- based serotyping methods will 
typically involve the use of nucleotide primers/probes which bind specifically to a 
region of the genome of a GBS serotype which includes a nucleotide which varies 
between two or more serotypes. Thus the primers/probes may comprise a 

20 sequence which is complementary to one of such regions. Where positions of 
heterogeneity are close together (e.g. positions 198, 204, 211 and 218 of cpsE), it 
may be desirable to use a primer/probe which hybridises specifically to a region 
of the GBS genome that comprises two or more positions of heterogeneity. Thus 
for example, a primer/probe may be designed that is complementary to 

25 nucleotides 195 to 220 of cpsE. Such primers/probes are likely to have improved 
specificity and reduce the likelihood of false positives. 

PCR-based methods of detection may rely upon the use of primer pairs, at 
least one of which binds specifically to a region of interest in one or more, but not 
all, serotypes. Unless both primers bind, no PCR product will be obtained. 

30 Consequently, the presence or absence of a specific PCR product may be used 
to determine the presence of a sequence indicative of specific GBS serotypes. 
However, as mentioned, only one primer need correspond to a region of 
heterogeneity in the genes of interest (such as the cpsD, cpsE, cpsF, cpsG, cpsl 
and/or cpsM genes). The other primer may bind to a conserved or heterogenous 

35 region within said gene or even a region within another part of the GBS genome, 
such as the cpshi gene, whether said region is conserved or heterogeneous 
between serotypes. Thus, for example, a combination of a primer (cpsGS) which 
binds to a region of the cpsG gene including positions 2172 to 2210, and a primer 
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which binds to a region of cpsH gene which is heterogeneous (lacpsHAI, 
lllcpsHA), may be used as the basis of distinguishing serotypes (la and III). 

Further, a primer which binds to a region of cpsl which is heterogeneous 
may be combined with a primer which binds to a region of cpsG which is 
constant. An example of such as primer pair is primer pair VIcpsIA, and cpsGSI, 
which give rise to a PCR product of 1517 bp and GBS serotype VI specific. 

Alternatively, primers that bind to conserved regions of the GBS genome 
but which flank a region whose length varies between serotypes may be used. In 
this case, a PCR product will always be obtained when GBS bacteria are present 
but the size of the PCR product varies between serotypes. 

Furthermore, a combination of specific binding of one or both primers and 
variations in the length of PCR primer may be used as a means of identifying 
particular molecular serotypes. 

Examples of specific primers/probes which target the cpsD, cpsE, cpsF, 
cpsG, cpsl or cpsM genes include the following: 



cpsDS 
cpsES 
cpsEAl 

20 cpsESI 
cpsEA2 
cpsES2 
cpsEA3 
cpsES3 

25 cpsEFA 
cpsFS 
cpsFA 

cpsGA 
30 cpsGAI 
cpsGS 
cpsGSI 
IbcpsIA 
IbcpsIS 



35 



IbcpslAI 
IVcpsMA 
VcpsMA 



GCA AAA GAA CAG ATG GAA CAA AGT GG 
CTT TTG GAG TCG TGG CTA TCT TG 
GA/T/GA AAA AAG GAA AGT CGT GTC G/ATT G 
CTT GGA C/TTC CTC TGA AAA GGA TTG 
AAA A/CGC TTG ATC AAC AGT TAA GCA GG 
GAT GGT/C GGA CCG GCT ATC TTT TCT C 
CTT AAT TTG TTC TGC ATC TAC TCG C 

GTT AGA TGT TCA ATA TAT CAA TGA ATG GTC TAT TTG GTC AG 

CCT TTC AAA CCT TAC CTT TAC TTA GC 

CAT CTG GTG CCG CTG TAG CAG TAC CAT T 

GTC GAA AAC CTC TAT A/GT A MOT GGT CTT ACA A/GCC AAA 

TAA CTT ACC 

AAG/C AGT TCA TAT CAT CAT ATG AGA G 

CCG CCA/G TGT GTG ATA ACA ATC TCA GCT TC 

ATG ATG ATA TGA ACT CTT ACA TGA AAG AAG CTG AGA TTG 

GAA CTC TTA CAT GAA AGA AGC TGA GAT TGT TAT CAC AC 

CTA TCA ATG AAT GAG TCT GTT GTA GGA CGG ATT GCA CG 

GAT AAT AGT GGA GAA ATT TGT GAT AAT TTA TCT CAA AAA 

GACG 

CCT GAT TCA TTG CAG AAG TCT TTA CGA TGC GAT AGG TG 
GGG TCA ATT GTA TCG TCG CTG TCA ACA AAA CCA ATC AAA TC 
CCC CCC ATA AGT ATA AAT AAT ATC CAA TCT TGC ATA GTC AG 
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VIcpsIA GAA GCA AAG ATT CTA CAC AGT TCT CAA TCA CTA ACT CCG 
cpsIA GTA TAA CTT CTA TCA ATG GAT GAG TCT GTT GTA GTA CGG 

The primer designations correspond to those given in Table 2. 

5 In relation to the alp2, a/p3 and rib surface protein antigen genes, 

heterogeneity and protein antigen gene subtype is assessed more at the level of 
whether a group B streptococcal bacterium contains the gene or not. Our results 
show that the specific combination of surface proteins genes present in a GBS 
genome is indicative of serotype/serosubtypes (see Table 9). Consequently, 

10 primers/probes suitable for use in the methods of the present invention are those 
that are specific for the particular genes. Thus probes/primers that are specific 
for a/p2 or a/p3 or rib are preferred. Figure 4 shows an alignment of a/p2 and 
alp3 that was used to design primers specific for a/p2 or specific for a/p3. 

Examples of specific primers/probes which target the alp2, alp3 and rib 

15 genes include the following: 

bcaS1 GGT AAT CTT AAT ATT TTT GAA GAG TCA ATA GTT GCT GCA TCT 
AC 

bcaS2 CCAGGGA GTG CAG CGA CCT TAA ATA CAA GCA TC 
20 balS GAT CCT CAA AAC CTC ATT GTA TTA AAT CCA TCA AGC TAT TC 
balA CCA GTT AAG ACT TCA TCA CGA CTC CCA TCA C 
bal23S1 CAG ACT GTT AAA GTG GAT GAA GAT ATT ACC TTT ACG G 
bal23S2 CTT AAA GCT AAG TAT GAA AAT GAT ATC ATT GGA GCT CGT G 
bal2S CTT CCG CCA GAT AAA ATT AAG 
25 bal2A CTG TTG ACT TAT CTG GAT AGG TC 

bal2A1 CGT GTT GTT CAA CAG TCC TAT GCT TAG CCT CTG GTG 
bal2A2 GGT ATC TGG TTT ATG ACC ATT TTT CCA GTT ATA CG 
bal3S GTT CTT CCG CTT AAG GAT AG 

bal3A GAC CGT TTG GTC CTT ACC TTT TGG TTC GTT GCT ATC C 
30 ribS2 GAAGTAATTTCAG GAA GTG CTG TTA CGT TAA ACA CAA ATA TG 
ribA1 GAA GGT TGT GTG AAA TAA TTG CCG CCT TGC CTA ATG 
ribA2 AAT ACT AGC TGC ACC AAC AGT AGT CAA TTC AGA AGG 
The primer designations correspond to those given in Table 6. 

In relation to the IS867. \S1548, IS 7387, ISSa4 and GBSil, heterogeneity 
35 and subtype is assessed more at the level of whether a group B streptococcal 
bacterium contains the element or not. The number of elements may also be 
assessed. Our results show that the specific combination of mobile elements 
present in a GBS genome is indicative of serotype/serosubtype (see Table 12). 
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Consequently, primers/probes suitable for use in the methods of the present 
invention are those that are specific for the particular mobile genetic elements. 
Thus probes/primers that are specific for \S861, \S1548, \S1381, ISSa4 and 
GBSil are preferred. 

5 Examples of specific primers/probes which target \S861, \S1548, \S1381, 

ISSa4 and GBSil include the following: 

IS861 S GAG AAA ACA AGA GGG AGA CCG AGT AAA ATG GGA CG 
IS861 A1 CAC GAT TTC GCA GTT CTA AAT AAA TCC GAC GAT AGC C 

10 IS861 A2 CAA ACT CCG TCA CAT CGG TAT AGC ACT TCT CAT AGG 
IS1 548S CTA TTG ATG ATT GCG CAG TTG AAT TGG ATA GTC GTC 
IS1548S1 GTT TGG GAC AGG TAG CGG TTG AGG AGA AAA GTA ATG 
IS1 548A1 CAT TAC TTT TCT CCT CAA CCG CTA CCT GTC CCA AAC 
IS1 548A2 CCC AAT ACC ACG TAA CTT ATG CCA TTT G 

15 IS1548A3 CGT GTT ACG AGT CAT CCC AAT ACC ACG TAA CTT ATG CC 

151 381 51 CTT ATG AAC AAA TTG CGG CTG ATT TTG GCA TTC ACG 

151381 52 GGC TCA GGC GAT TGT CAC AAG CCA AGG GAG 
IS1381 A CTA AAA TCC TAG TTC ACG GTT GAT CAT TCC AGC 
ISSa4S CGT ATC TGT CAC TTA TTT CCC TGC GGG TGT CTC C 

20 ISSa4A1 GCC GAT GTC ACA ACA TAG TTC AGG ATA TAG CCA G 
ISSa4A2 CGT AAA GGA GTC CAA AGA TGA TAG CCT TTT TGA ACC 
GBSil S1 CAT CTC GGA ACA ATA TGC TCG AAG CTT ACA AGC AAG TG 
GBSil S2 GGG GTC ACT ATC GAG CAG ATG GAT GAC TAT CTT CAC 
GBSil A1 AAT GGC TGT TTC GCA GGA GCG ATT GGG TCT GAA CC 

25 GBSil A2 CCA GGG ACA TCA ATC TGT CTT GCG GAA CAG TAT CG 

Preferably, the primers/probes comprise at least 10, 15 or 20 nucleotides. 
Typically, primers/probes consist of fewer than 100, 50 or 30 nucleotides. 
Primers/probes are generally polynucleotides comprising deoxynucleotides. They 

30 may also be polynucleotides which include within them synthetic or modif.ed 
nucleotides. A number of different types of modification to oligonucleotides are 
known in the art. These include methylphosphonate and phosphorothioate 
backbones, addition of acridine or polylysine chains at the 3' and/or 5' ends of the 
molecule. For the purposes of the present invention, it is to be understood that 

35 the polynucleotides described herein may be modified by any method available ,n 
the art. Primers/probes may be labelled with any suitable detectable label such 
as radioactive atoms, fluorescent molecules or biotin. 
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In one embodiment, primers/probes have a high melting temperature of 
>70°C so that they may be used in rapid cycle PCR. 

Compositions comprising a plurality of nucleotides that are used to analyse 
one or more regions within the cpsD, cpsE, cpsF, cpsG, or cpsl/M genes may 
5 also further comprise nucleotides that may be used to analyse one or more 
regions within the cpsH gene. Suitable nucleotides are described in the 
Examples, although a person skilled in the art could design other suitable 
sequences based on the sequence alignment shown in Figure 3. 

Further, compositions comprising a plurality of nucleotides that are used to 
10 analyse one or more regions within a/p2, alp3 or rib genes may also further 
comprise nucleotides that may be used to analyse one or more regions within the 
C alpha (bca) and C beta {bad) genes (C beta gene also known as bag). 

A variety of techniques may be used to analyse one or more regions within 
the genome of a bacterium of interest. Typically, a sample of interest, which is 
15 suspected of containing GBS bacteria is treated, using standard techniques to 
obtain genomic DNA from any microorganisms present in the sample. It may be 
desirable for a number of subsequent detection steps to use nucleic acid 
preparation techniques that result in substantial fragmentation of the genomic 
DNA. The sample may be from a bacterial culture or a clinical sample from a 
20 patient, typically a human patient. Clinical samples may be-cultured to produce a 
bacterial culture. However, it is also possible to test clinical samples directly with 
a culturing step. 

The genomic DNA is then subjected to one or more analysis steps which 
may include sequencing, enzymatic amplification and/or hybridisation. These 
25 general techniques of DNA analysis are known in the art and are discussed in 
detail in, for example, Sambrook et al. 2001* and Ausubel et al. 1999 supra. 

Serotyping may involve a one or more steps. For example, it may be 
desirable to carry out an initial step of determining whether there are nucleotide 
* sequences present in the sample which are conserved between GBS seroptypes 
30 but not found in any other organism. This may be achieved by using PCR 
primers that detect any (but only) GBS bacteria (e.g. using primer pairs 
Sag59/Sag1 90 and/or DSF2/DSR1 - see Tables 2 and 3). 

Molecular serotyping for specific GBS serotypes can then be performed by 
detecting the presence of one or more regions of heterogeneity in the regions of 
35 interest using any suitable technique such as sequencing, enzymatic 
amplification and/or hybridisation based on the probes/primers discussed above. 

A particularly preferred detection technique is PCR, such as rapid cycle 
PCR (Kong et al., 2000). 
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An example of a multi-step serotyping strategy (algorithm) is shown in 
Figure 2. However, a variety of other strategies are envisaged and can be 
designed by the skilled person using the sequence heterogeneity information 
presented herein. In particular, it is preferred that the serotyping procedure 
comprise at least one analysis step based on analysing one or regions of the 
cpsD, cpsE, cpsF, cpsG and/or cpsl/M genes. This analysis may optionally be 
combined with an analysis of one or more regions within the cpsH gene. Similar 
techniques may be used to analyse the cpsH gene regions and suitable primer 
sequences and methods are also described in the Examples. 

Analysis of the presence of absence of the al P 2, alp3 and/or rib genes may 
optionally be combined with an analysis of the presence or. absence of C alpha 
{bca gene), C beta (bac) gene sequences as is described in the Examples. 
Similar techniques may be used to analyse these regions and suitable pnmer 
sequences and PCR methods are also described in the Examples. 

Furthermore, analysis of the presence of absence of the alp2, alp3 and/or 
rib genes (and optionally the bca and bac genes) may be combined with an 
analysis of the presence or absence of mobile genetic elements. 

Thus a typing strategy may involve an analysis of cps genes, surface 
protein genes and/or mobile genetic elements in various combinations to provide 
20 more serosubtyping and subtyping information. 

Analysis of GBS genomic sequences using the above techniques may 
take place in solution followed by standard resolution using methods such as gel 
electrophoresis. However in a preferred aspect of the invention, the 
primers/probes are immobilised onto a solid substrate to form arrays. 
25 The polynucleotide probes are typically immobilised onto or in d.screte 

regions of a solid substrate. The substrate may be porous to allow immobilisation 
within the substrate or substantially non-porous, in which case the probes are 
typically immobilised on the surface of the substrate. Examples of suitable solid 
substrates include flat glass (such as borosilicate glass), silicon wafers, m.ca, 
30 ceramics and organic polymers such as plastics, including polystyrene and 
polymethacrylate. It may also be possible to use semi-permeable membranes 
such as nitrocellulose or nylon membranes, which are widely available. The semi- 
permeable membranes may be mounted on a more robust solid surface such as 
glass. The surfaces may optionally be coated with a layer of metal, such as gold, 
35 platinum or other transition metal. 

Preferably, the solid substrate is generally a material hav.ng a rigid or 
semi-rigid surface. In preferred embodiments, at least one surface of the 
substrate will be substantially flat, although in some embodiments it may be 
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desirable to physically separate synthesis regions for different polymers with, for 
example, raised regions or etched trenches. It is also preferred that the solid 
substrate is suitable for the high density application of DNA sequences in discrete 
areas of typically from 50 to 100 urn, giving a density of 10000 to 40000 cm" 2 . 

5 The solid substrate is conveniently divided up into sections. Thjs may be 

achieved by techniques such as photoetching, or by the application of 
hydrophobic inks, for example teflon-based inks (Cel-line, USA). Discrete 
positions, in which each different probes are located may have any convenient 
shape, e.g., circular, rectangular, elliptical, wedge-shaped, etc. 

10 Attachment of the library sequences to the substrate may be by covalent 

or non-covalent means. The library sequences may be attached to the substrate 
via a layer of molecules to which the library sequences bind. For example, the 
probes may be labelled with biotin and the substrate coated with avidin and/or 
streptavidin. A convenient feature of using biotinylated probes is that the 

15 efficiency of coupling to the solid substrate can be determined easily. Since the 
polynucleotide probes may bind only poorly to some solid substrates, it is often 
necessary to provide a chemical interface between the solid substrate (such as in 
the case of glass) and the probes. Thus, the surface of the substrate may be 
prepared by, for example, coating with a chemical that increases or decreases 

20 the hydrophobicity or coating with a chemical that allows covalent linkage of the 
polynucleotide probes. Some chemical coatings may both alter the hydrophobicity 
and allow covalent linkage. Hydrophobicity on a solid substrate may readily be 
increased by silane treatment or other treatments known in the art. Examples of 
suitable chemical coatings include polylysine and poly(ethyleneimine). Further 

25 details of methods for the attachment of are provided in US Patent No. 6,248,521 . 
Methods for immobilizing nucleic acids by introduction of various functional 
groups to the molecules are also described in Bischoff et a/., 1987 (Anal. 
Biochem., 164:336-3440 and Kremsky et a/., 1987 (Nucl. Acids Res. 15:2891- 
2910). 

30 Techniques for producing immobilised arrays of nucleic acid molecules have 

been described in the art. A useful review is provided in Schena et a/., 1998, 
TibTech 16: 301-306, which also gives references for the techniques described 
therein. 

Microarray-manufacturing technologies fall into two main categories— 
35 synthesis and delivery. In the synthesis approaches, microarrays are prepared in 
a stepwise fashion by the in situ synthesis of nucleic acids from biochemical 
building blocks. With each round of synthesis, nucleotides are added to growing 
chains until the desired length is achieved. A number of prior art methods describe 
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how to synthesise single-stranded nucleic acid molecule libraries in situ, using for 
example masking techniques (photolithography) to build up various permutations of 
sequences at the various discrete positions on the solid substrate. U.S. Patent No. 
5,837,832 describes an improved method for producing DNA arrays immobilised to 
silicon substrates based on very large scale integration technology. In particular, 
U.S. Patent No. 5,837,832 describes a strategy called "tiling" to synthesize specific 
sets of probes at spatially-defined locations on a substrate which may be used to 
produced the immobilised DNA libraries of the present invention. U.S. Patent No. 
5,837,832 also provides references for earlier techniques that may also be used. 

' The delivery technologies, by contrast, use the exogenous deposition of 
preprepared biochemical substances for chip fabrication. For example, DNA may 
also be printed directly onto the substrate using for example robotic devices 
equipped with either pins (mechanical microspotting) or piezo electric devices (ink 
jetting). In mechanical microspotting, a biochemical sample is loaded into a 
spotting pin by capillary action, and a small volume is transferred to a solid 
surface by physical contact between the pin and the solid substrate. After the first 
spotting cycle, the pin is washed and a second sample is loaded and deposited to 
an adjacent address. Robotic control systems and multiplexed printheads allow 
automated microarray fabrication. Ink jetting involves loading a biochemical 
sample, such as a polynucleotide into a miniature nozzle equipped with a 
piezoelectric fitting and an electrical current is used to expel a precise amount of 
liquid from the jet onto the substrate. After the first jetting step, the jet is washed 
and a second sample is loaded and deposited to an adjacent address. A 
repeated series of cycles with multiple jets enables rapid microarray production. 

In one embodiment, the microarray is a high density array, comprising 
greater than about 50, preferably greater than about 100 or 200 different nucleic 
acid probes. Such high density probes comprise a probe density of greater than 
about 50, preferably greater than about 500, more preferably greater than about 
1,000, most preferably greater than about 2,000 different nucleic acid probes per 
cm 2 . The array may further comprise mismatch control probes and/or reference 
probes (such as positive controls). 

Microarrays of the invention will typically comprise a plurality of 
primers/probes as described above. The primers/probes may be grouped on the 
array in any order. However, it may be desirable to group primers/probes 
according to types (capsular polysaccharide gene serotypes, serosubtypes; 
protein antigen gene subtypes; mobile genelic elements subtypes), or groups of 
types (capsular polysaccharide gene serotypes, serosubtypes; protein antigen 
gene subtypes; mobile genelic elements subtypes) for which they are specific. 
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Such grouping may be arranged such that the resulting patterns are easily 
susceptible to pattern recognition by computer software. 

Elements in an array may contain only one type of probe/primer or a 
number of different probes/primers. 

5 Detection of binding of GBS genomic DNA to immobilised probes/primers 

may be performed using a number of techniques. For example, the immobilised 
probes which are specific to a number of types (capsular polysaccharide gene 
serotypes, serosubtypes; protein antigen gene subtypes; mobile genelic elements 
subtypes), may function as capture probes. Following binding of the genomic 

10 DNA to the array, the array is washed and incubated with one or more labelled 
detection probes which hybridise specifically to regions of the GBS genome 
which are conserved. The binding of these detection probes may then be 
determined by detecting the presence of the label. For example, the label may 
be a fluorescent label and the array may be placed in an X-Y reader under a 

15 charge-coupled device (CCD) camera. 

Other techniques include labelling the genomic DNA prior to contact with 
the array (using nick-translation and labelled dNTPs for example). Binding of the 
genomic DNA can then be detected directly. 

It is also possible to employ a single PCR amplification step using labelled 

20 dNTPs. In this embodiment, the genomic DNA fragment binds to a first primer 
present in the array. The addition of polymerase, dNTPs, including some labelled 
dNTPs and a second primer results in synthesis of a PCR product incorporating 
labelled nucleotides. The labelled PCR fragment captured on the plate may then 
be detected. 

25 A number of available detection techniques do not require labels but 

instead rely on changes in mass upon ligand binding (e.g. surface plasmon 
resonance- SPR). The principles of SPR and the types of solid substrates 
required for use in SPR (e.g. BIACore chips) are described in Ausubel ef a/., 
1999, supra. 

30 

C^ Uses 

As discussed above, group B streptococcus (GBS) - Streptococcus 
agalactiae - is the commonest cause of neonatal and obstetric sepsis and an 
increasingly important cause of septicaemia in the elderly and 
35 immunocompromised patients. Thus, the detection methods, probes/primer and 
microarrays of the invention may be used in the diagnosis of GBS infections in 
pregnant women, elderly and/or immunocompromised patients. The PCR and 
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microarray techniques described herein may be of particular use in routine 
antenatal screening of pregnant women as well as in diagnosing infections in 
pregnant women given the increased accuracy and sensitivity compared to 
conventional identification and serotyping. These methods are also likely to give 

5 faster results since it will not generally be necessary to culture clinical samples to 
obtain enough material. Further, the molecular techniques can be used in most 
laboratories without the need for specialist expertise or reagents. 

The molecular typing methods of the invention may also assist in 
comprehensive strain identification that will be useful for epidemiological and 

10 other related studies that will be needed to monitor GBS isolates before and after 
introduction of GBS conjugate vaccines. 

The present invention will now be described in more detail with reference 
to the following examples, which are illustrative only and non-limiting. The 
15 examples refer to Figures: 

retailed description of the Figures. 

Figure 1 Molecular serotype identification based on the sequence heterogeneity 
20 of the 3-end of C psD-cpsE-cpsF-an6 the 5'-end of cpsG (relevant primers are 
shown). 

Figure 2. Algorithm for GBS molecular serotype (MS) identification by PCR and 
sequencing. 

25 

Figure 3 Multiple sequence alignments of the gene sequences of cpsG-cpsH- 
cpsl/M for serotypes la, lb, II, III. IV, V and VI (start and stop codons are 
highlighted in bold). 

30 Figure 4. Two sites (*) of sequence heterogeneity between alp2 (AF208158, 
upper lines) and a/p3 (AF291065, lower lines) used to distinguish them (relevant 
primers are shown). 

Figure 5. Genetic relationship of 194 invasive Australasia GBS strains (or 56 
35 genotypes). 

Notes for column headed "Genetic Markers of GBS genotypes": 
Protein antigen gene profile codes are: 
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"A": 5'end of bca positive; 

u a" or "as": bca repetitive unit or bca repetitive unit-like region positive, 
with multiple or single band amplicons, respectively; 
U B": bac positive; 
5 u R n : rib positive; 

u alp2 M : alp2 positive; 
u alp3 w : alp3 positive; 

"None": isolate contains none of the above protein genes. 
The molecular markers in bold type show the common features in each cluster. 

10 

Notes for column headed "No. of strains": 

After "+ M are the numbers of CSF isolates, the others are blood isolates. 

Notes for column headed "Genotypes": 

Each genotype was characterized by a distinct combination of the cps 
15 genes, protein gene profiles and mobile genetic elements. The predominant 
genotype in each serotype were named as the number "1 n genotype of that 
serotype. 

Notes for the dendrogram: 

At about distance 16, the 56 genotypes could be separated into 8 clusters 
20 (1-8); at about distance 22.5 the 56 genotypes could be separated into 3 cluster 
groups (A, B, C). 

EXAMPLES 

25 MATERIALS AND METHODS 

GBS reference strains and clinical isolates. 

A panel of nine GBS serotypes (la to VIII) was kindly provided by Dr 
Lawrence Paoletti, Channing Laboratory, Boston USA (reference panel 1). Dr 

30 Diana Martin, Streptococcus Reference Laboratory, at ESR, Wellington, New 
Zealand, provided another panel ot nine international reference GBS type-strains 
including serotypes la to VI (reference panel 2) (Table 1). In addition, we tested 
isolates from 205 clinical cases including 146 which had been referred from 
various laboratories in New Zealand for serotyping and 59 isolated from normally 

35 sterile sites over a period of 10 years in one diagnostic laboratory in Sydney. One 
culture was subsequently shown to be mixed, so 206 different isolates were 
examined. Conventional serotyping (CS) was performed at the Streptococcus 
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Reference Laboratory, at ESR, Wellington, New Zealand, and MS at the Centre 
for Infectious Diseases and Microbiology Laboratory Services, ICPMR, Sydney, 
Australia. 

The two panels of GBS reference strains and 63 selected clinical isolates 
5 were studied in more detail, by sequencing >2200 base pairs (bp) of each to 
identify appropriate sequences for use in MS. These and the remaining clinical 
isolates were then used to evaluate the MS method and compare results with 
those of CS. Typing by both methods was done initially without knowledge of 
results of the other. 

10 Bacterial isolates were retrieved from storage by subculture on blood agar 

plates (Columbia II agar base supplemented with 5% horse blood) and incubated 
overnight at 37°C. 

Invasive GBS clinical isolates 

15 All 194 isolates used in the study of mobile genetic elements were 

recovered from the blood (177) or CSF (17) of 191 patients (107 female, SOmale, 
four sex unrecorded; three cultures each contained mixed growth of two GBS 
serotypes). 108 isolates were from specimens submitted for culture to the Centre 
for Infectious Diseases and Microbiology Laboratory Services, ICPMR, Sydney, 

20 Australia during 1996-2001 and 83 were referred to Institute of Environmental 
Science and Research (ESR), Porirua, Wellington, New Zealand for serotyping, 
from various diagnostic laboratories in New Zealand, during 1994-2000. 

Patients were classified into age-groups for analysis of genotype 
distribution as follows: neonatal, early onset (0-6 days); neonatal, late onset (7 

25 days to 3 months); infant and child (4 months-14 years); young adult (15-45 
years); middle-aged (46-60 years); elderly (>60 years). 

These isolates are mainly a subset of the isolates described above but 
with reference strains and non-invasive isolates excluded. 

30 Conventional serotyping (CS). 

CS was performed using standard methodology (Wilkinson and Moody, 
1969) Briefly, an acid-heated (56°C) extract was prepared for each isolate and 
the serotype determined by immuno-precipitation of type-specific antiserum in 
agarose. An isolate was considered positive for a particular serotype when the 
35 precipitation occurring formed a line of identity with that of the control strain. 
Antisera used were prepared at ESR in rabbits against serotypes la, lb, lc, II, III, 
IV V and the R protein antigen. Fourteen selected isolates, including six that 
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were nontypable using antisera against serotypes l-V, six that initially gave 
discrepant results between CS and MS and two separate isolates from a mixed 
culture, were kindly tested using antisera against all serotypes by Abbie Weisner 
and Dr. Androulla Efstratiou at Central Public Health Laboratory, Colindale, 
5 London, UK. 

Molecular serotype identification (MS); development of method. 

Oligonucleotide primers. 

The oligonucleotide primers used in this study, their target sites and 

10 melting temperatures are shown in Tables 2, 6 and 10. Their specificities and 
expected lengths of amplicons are shown in Tables 3, 7 and 11. The primers 
were synthesised according to our specifications by Sigma-Aldrich (Castle Hill 
NSW, Australia). Four previously published oligonucleotide primers, and a series 
of new primers designed by us were used to sequence the genes of interest, 

15 namely 16S/23S rRNA intergenic spacer region and partial cps gene cluster, or to 
amplify unique sequences of individual GBS serotypes. Six previously published 
oligonucleotide primers and a series of nfew primers designed by us were used to 
sequence parts of and/or to specifically amplify genes encoding GBS surface 
proteins. We also designed a series of primers to sequence parts of and/or to 

20 specifically amplify five known GBS mobile genetic elements. Some were 
designed with high melting temperatures (>70°C) to be used in rapid cycle PCR. 

DNA preparation and polymerase chain reaction (PCR). 

Five individual GBS colonies or a sweep of culture were sampled using a 

25 disposable loop and resuspended in 1 ml of digestion buffer (10mM Tris-HCI, pH 
8.0, 0.45% Triton X-100 and 0.45% Tween 20) in 2 ml Eppendorf tubes. The 
tubes containing GBS suspension were heated at 100°C (dry block heater or 
water bath) for 10 minutes then quenched on ice and centrifuged for 2 minutes at 
14,000 rpm to pellet the cell debris. 5 \xl of each supernatant containing 

30 extracted DNA was used as template for PCR (Mawn et al., 1 993). 

PCR systems (25|iL for detection only, 50 for detection and 
sequencing) were used as previously described (Kong et al., 1999). The 
denaturation, annealing and elongation temperatures and times used were 96°C 
for 1 second, 55-72°C (according to the primer Tm vaiues or as previously 

35 described) for 1 second and 74°C for 1 to 30 seconds (according to the length of 
amplicons), respectively, for 35 cycles. 
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10 of PCR products were analysed by electrophoresis on 1.5 % 
agarose gels, which were stained with 0.5 ug ethidium bromide ml/ 1 . For 
detection and/or serotype identification, the presence of PCR amplicons of 
expected length, shown by ultraviolet transillumination, were accepted as 
5 positive. For sequencing, 40 nL volumes of PCR products were further purified by 
polyethylene glycol precipitation method (Ahmet et al., 1999). 

Sequencing. 

The PCR products were sequenced using Applied Biosystems (ABI) Taq 
10 DyeDeoxy terminator cycle-sequencing kits according to standard protocols. The 
corresponding amplification primers or inner primers were used as the 
sequencing primers. 

Multiple sequence alignments and sequence comparison. 
15 Multiple sequence alignments were performed with Pileup and Pretty 

programs in Multiple Sequence Analysis program group. Sequences were 
compared using Bestfit program in Comparison program group. All programs are 
provided in WebANGIS, ANGIS (Australian National Genomic Information 
Service), 3 rd version. 
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Surface protein gene profile codes 

Each isolate was given a protein gene profile code according to positive 
PCR results using various primer pairs, as shown in Table 7. 

25 Nucleotide sequence accession numbers. 

The new sequence data described have been submitted to the GenBank 
Nucleotide Sequence Databases and allocated the following accession numbers: 
AF291411-AF291419 (16S/23S rRNA intergenic spacer regions for serotypes la 
to VIII reference strains from reference panel 1); AF332893-AF332917, 

30 AF363032-AF363060, AF367973, AF381030 and AF381031 (partial cps gene 
clusters for two panels of reference strains (Table ) and selected representative 
clinical isolates); AF367974 (partial bac gene sequence, with an insert.on 
sequence \S1381 from one isolate), AF362685-AF362704 (partial bac gene 
sequences for all oac-positive isolates) and AF373214 (partial rib-like gene for 

35 reference strain Prague 25/60, an R protein standard strain). 

Previously reported sequence data referred to herein have appeared in the 
GenBank Nucleotide Sequence Databases with the following accession numbers: 
AB023574 (16S rRNA gene); U39765, L31412 (16S/23S rRNA intergenic spacer 



WO 03/1125216 PCT/AUO2/03281 

20 

regions); X68427 (S. oralis 23S rRNA gene); X72754 [cfb gene); AB028896 (cps 
gene cluster for serotype la); AB050723 (partial cps gene cluster for serotype lb); 
AF 163833 (cps gene cluster for serotype III); AF355776 (cps gene cluster for 
serotype IV); AF349539 (cps gene cluster for serotype V); AF337958 (cps gene 
5 cluster for serotype VI); M97256 (bca gene); X58470, X59771 [bac gene); 
U58333 [rib gene); AF208158 (a/p2 gene), AF291065-AF291072 (alp3 gene); 
AFD64785 (lSY38t); M22449 (lS86f); Y14270 (\S1548); AF064785 (IS 1381); 
AF165983 (ISSa4); and AJ292930 (GBSil). 

10 Statistical analysis and dendrogram. 

SSPS version 11 software was used for statistic analysis. A dendrogram 
was formed using Average Linkage (between groups) and Hierarchical Cluster 
Analysis in SSPS version 1 1 software. The presence or absence of each marker - 
MS la, lb, II, IV-VI , sst 111-1-4; pgp "A", "R", V, "as", fl a(p2\ a!p3 M ; bac subgroups 

15 1, 1a, 2, 3, 3a, 3b, 3c, 4, 4b, 5a, 7. 7a, 8, 9, 9a, 10, n1, n2; and mge IS 1381, 
\$86l t \S1548, lSSa4, GBSil - were included in the analysis. The genotypes were 
each characterized by a distinct combination of the molecular serotyping (MS) or 
sst, pgp and mge. 

20 Example 1 - Study of inter- and intra-serotype/serosubtype sequence 
heterogeneity in specific regions of the GBS genome and assessment of 
suitability for molecular serotyping/serosubtyping. 

Polymerase chain reaction. 

25 With two exceptions, all GBS-specific primer pairs produced amplicons of 

the expected size from all reference strains and clinical isolates tested (Table 3). 
The exceptions were Sag59/Sag190 and CFBS/CFBA, Both target the cfb gene, 
but failed to produce amplicons from one clinical Isolate, despite repeated 
attempts. We assumed that this isolate either lacked the cfb gene or that the 

30 gene was present in a mutant form. It has been suggested previously that PCR 
targeting the cfb gene will not identify all GBS isolates (Hassan et ah, 2000) and 
that another primer pair based on 16S rRNA gene 7 DSF2/DSR1 (Ahmet et al., 
1999) was not entirely specific. Therefore, in this study, we used both primer 
pairs (DSF2/DSR1 and Sag59/Sag190) to confirm all the isolstes were GBS. 

35 

Sequence heterogeneity of 16S/23S rRNA intergenic spacer regions. 

The 16S/23S rRNA intergenic spacer regions were sequenced for the 
serotypes la to VIII from reference panel 1. Multiple sequence alignment showed 
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differences between serotypes at only two positions: 207 (serotype V is T or C 
[T/C], serotypes VII and VIII are C, others are T) and 272 (serotype III is T, others 
G). These regions are therefore unsuitable for MS. 

5 Sequence heterogeneity at the 3'-end of cpsD-cpsE-cpsF-and the 5'-end of 
cpsG. 

Using a series of primers targeting the 3'-end of cpsD-cpsE-cpsF-and the 
5"-end of cpsG, we amplified and sequenced 2226 or 2217 bp - depending on the 
presence or absence of a nine-base repetitive sequence - from both panels of 
10 reference strains (serotypes la to VII) and 63 selected clinical isolates. 
Representative sequences were deposited into GenBank. See Table 1 for 
GenBank accession numbers of reference panel strains. 

Repetitive sequence. 

15 At the 3'-end region of cpsD, we found a nine-base repetitive sequence (TTA 
CGG CGA) in most isolates of MS la and II, some of MS III, all of MS IV, V, and 
VII, but none of the isolates of MS lb or VI examined. (Table 4). The presence or 
absence of this repetitive sequence can be used to further subtype MS la, II and 
III (see below). 

20 

Intra-serotype heterogeneity. 

In general, intra-serotype heterogeneity was low - there were minor random 
variations in a few isolates of all serotypes except MS III, in which the intra- 
serotype heterogeneity was more complex. MS III could be divided into four 

25 sequence subtypes on the basis of heterogeneity at 22 positions - 62, 139, 144, 
204, 300, 321, 429, 437, 457, 486, 602, 636, 971, 1026, 1194, 1413, 1501, 
1512,1518, 1527, 1629, and 2134 - and the presence or absence of the repetitive 
sequence (at 78-86) (Table 4). 

Among 60 MS III isolates (58 clinical isolates and two reference strains), 

30 serosubtypes 111-1 (30 isolates) and III-2 (22 isolates) were predominant. The 
repetitive sequence was present in serosubtype 111-1 but not III-2; there were 
differences at seven other sites (139, 144, 204, 300, 321 , 636, and 1629). 

There were five isolates belonging to serosubtype III-3, which contained 
the repetitive sequence and were identical with serosubtype 111-1 at three variable 

35 sites (139, 144, and 300) and with serosubtype III-2 at four (204,321, 626 and 
1629) Seroubtype III-3 differed from both serosubtypes 111-1 and III-2 at seven 
sites (486, 1026, 1413, 1512, 1518, 1527, and 2134). These seven sites in 
serosubtype III-3 were identical with the corresponding sites of MS la. 
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There were three serosubtype 111-4 isolates, whose sequences were nearly 
identical with the corresponding sequence of MS II. The only exception was at 
position 437, where the nucleotide was T in serosubtype 111-4 (as in MS VII), and 
C in MS II. This difference can be used (in addition to PCR, see below) to 
5 differentiate serosubtype III— 4 from MS II., Two serosubtype ill— 4 isolates 
contained the repetitive sequence, and the other did not. Because of the small 
number of serosubtype 111-4 isolates, we did not use the repetitive sequence to 
subtype them further. 

10 Inter-serotype heterogeneity. 

There were 56 sites of heterogeneity between the eight MS (Table 4). The 
most suitable sites, for use in PCR/sequencing for MS, were a group of 23 sites 
nearest to the 3'-end of the region (Table 4, Figure 1). Firstly, they were 
consistent across two panels of reference strains and most clinical isolates (the 

15 only exceptions were the small number of serosubtypes 111-3 and 111-4 isolates, 
see below). Secondly, they were relatively concentrated within a 790 bp region, 
which is a convenient length for sequencing in a single reaction. Thirdly, they 
contained enough heterogeneity sites to allow differentiation, with few exceptions, 
of MS la-VII. Based only on this 790 bp region, serosubtype 1 1 1-3 cannot be 

20 distinguished from MS la, nor serosubtype 111-4 from MS II. However, they can be 
identified by MS Ill-specific PGR (see below). 

Serotype VIII does not form amplicons with primer pairs targeting the 790 
bp region, but can be identified by exclusion after PCR identification of GBS. In 
this study, one MS VIII isolate was identified, for which none of the primer pairs 

25 that amplify the 2226 bp region (in addition to those that amplify the 790 bp 
region) produced amplicons; This result was confirmed by the use of serotype 
Vlll-specific antiserum. 

Mixed serotype-specificities in single isolates. 

30 Eleven isolates were identified as one MS on the basis of the MS-specific 

PCR and overall sequence (within the 2226/2217 bp segment) but their 
sequences differed at some sites from isolates of the same MS and shared site- 
specific characteristics of another. They included five serosubtype III-3 isolates 
and three serosubtype III-4 (see above). One non-serotypable reference strain 

35 (Prague 25/60) : which was identified as MS II, differed from other MS II isolates 
at five sites at the 5'-end of the region, and was identical with MS III at three of 
these sites. Prague 25/60 MS Ill-specific PCR was negative. One clinical isolate 
identified as CS II, and MS II on the basis of its overall sequence, had bases at 
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nine sites at the 5'-end of the region, that were characteristic of serotype lb; MS 
lb-specific PCR was negative. Finally, one CS V reference strain (Prague 10/84) 
had the same sequencing result as the corresponding sequence in GenBank 
(AF349539), but both were different, at three sites at the 5'-end of the region, 

5 from sequences of the other MS V strains that we studied. 

All of these mixed-serotype specificities, except for those associated with 
serosubtypes III-3 and III-4, occurred at the 5'-end region of the 2226/2217 
fragment. This supported our selection of the 790 bp 3-end as the sequencing 
target for MS. Using this target, all MS were correctly identified except for MS III 

10 belonging to serosubtypes 111-3 and 111-4, which can be identified by MS Ill- 
specific PCR (see Example 2). 

Example 2 - Molecular serotype identification (MS) based on MS-specific 
PCR targeting the 3'-end of cpsG-cpsH-cps 1/cpsM. 

15 Our sequence alignment results showed that there was significant 

sequence heterogeneity in the 3'-end of cpsG-cpsH-cps 1/cpsM (Figure 3), which 
makes it appropriate for use in the design of specific primer pairs for 
differentiation of serotypes la, lb, III, IV, V, and VI directly by PCR. To fulfil 
possible additional future requirements - for example, development of multiplex 

20 PCR and/or to allow further evaluation of the sequence typing method, we 
designed several primer pairs for each serotype (Tables 2 & 3). Using two panels 
of reference strains and the specified conditions, all primer pairs amplified DNA 
only from the corresponding serotypes. When clinical isolates were tested, s.m.lar 
results were obtained with two sets of MS-specific primer pairs. In general, more 

25 stringent conditions (lower primer concentration, higher annealing temperatures) 
could be used with primers generating smaller amplicons. Those selected for MS 
are shown in Table 3 and Figure 2. 

A MS was assigned, by PCR, to 179 of 206 (86.9%) clinical isolates as 
follows: MS la 40; MS lb 35; MS III 58 (including those previously identified as 
30 serosubtypes III-3 and III-4); MS IV 7; MS V 36; MS VI 3. 
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Example 3 - Comparison of serotype identification results between MS 
andCS. 

After CS and MS had been completed, the results were compared. Initial 
results were discrepant for 15 isolates, all but five of which (see below) were 
5 resolved by retesting and/or correction of clerical errors. 

The CS and MS/sequence subtyping results are shown in Table 5. A MS 
was assigned to all isolates by PCR and/or sequencing, compared with 188 of 
206 (91.3%) by CS. Specific PCR has not yet been developed for MS II and VIII, 
so all MS II isolates were determined by sequencing only and one presumptive 
10 MS VIII isolate was decided by exclusion (see Example 1). For all other isolates, 
the results of PCR and sequencing were consistent, except for serosubtypes III-3 
and III-4 and other minor sequence differences described above (Example 1). CS 
results correlated well with PCR results. 

Final CS and MS results were the same for all 188 isolates (100%) for 
15 which results for both methods were available. Eighteen clinical isolates that were 
non-serotypable by CS, were assigned MS as follows: la, two; lb, five; II, one; 
serosubtype MM, three; serosubtype 111-2, one; V, five; and VI, one. 

Sequences (2217 bp) of three clinical isolates that we identified as MS VI, 
were identical with those for serotype VI reference strains and the corresponding 
20 sequence in GenBank (AF337958). 

Mixed culture. 

Four clinical isolates gave positive results with MS Ill-specific PCR, but 
were provisionally identified as MS II by sequencing. Three were CS III and one 

25 CS II, with a weak cross-reaction with serotype III antiserum. These isolates were 
studied further by subculturing 12 individual colonies of each. All subcultures 
were tested by MS Ill-specific PCR. All 12 colony subcultures of the three CS III 
isolates were positive by MS Ill-specific PCR and the isolates were therefore 
classified as serosubtype III-4 (see above). However, 11 of 12 colony subcultures 

30 of the fourth isolate were negative by MS Ill-specific PCR; and one was positive 
by MS Ill-specific PCR. It was therefore assumed that this was a mixed culture, 
predominantly of MS/CS II. The one MS Ill-specific PCR positive colony was 
subsequently identified as serosubtype III-2 and included as an additional clinical 
isolate (total 206 in all). 

35 
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Example 4 - Algorithm for serotype assignment of GBS by PCR and 
sequencing 

As an example of how the PCR and sequencing methods described above 
may be used clinically to perform GBS serotype identification, we designed an 
5 algorithm for clinical use. All the primers (except the inner sequencing primers) 
used were given high melting temperature (>70°C), so rapid cycle PCR could be 
used (Figure 2) (see Table 2 for primer sequences). 

Example 5 - Identification of regions in the a!p2, a/p3 and rib genes suitable 
10 for protein antigen gene specific subtyping 

Polymerase chain reactions. 

With few exceptions, all primer pairs produced amplicons of predicted 
length from isolates giving positive results (Table 7). The exceptions included one 
isolate that was positive by PCR using primer pairs GBS1360S/GBS1937A and 

15 GBS1717S/GBS1937A (which both target bac gene) but produced amplicons 
significantly longer than those of other bac gene-positive isolates. Sequencing 
showed that the amplicon contained the insertion sequence \S1381 with m.nor 
variations compared with the published sequences (Tamura et al., 2000). The 
amplicons produced using primers IgAagGBS/RlgAagGBS and lgAS1/lgAA1 

20 (also targeting bac gene) varied in length (Berner et al., 1999) and were 
sequenced for further subtyping (see below and Table 8). 

Amplicon sequencing results. 

To confirm the specificity of selected primer pairs that we had designed or 
25 modified, we sequenced 10 of 23 amplicons produced by bcaS1/bcaA (targeting 
the 5'-end of bca gene) and all of those produced by ribS1/ribA3 (targeting nb 
gene) and GBS1360S/GBS1937A (targeting bac gene), from the two panels of 
reference strains and 31 randomly selected clinical isolates. . 

All 10 amplicons of primers bcaS1/bcaA and 12 of 13 of primers 
30 ribS1/ribA3 were identical with the corresponding gene sequences in GenBank 
(M97256 bca gene and U58333, rib gene, respectively). One additional isolate, 
namely Prague 25/60 in reference panel 2 (which is used to raise R ant.serum), 
produced an amplicon with primer pair ribS1/ribA3 only at a lower anneal.ng 
temperature (55 °C) but not with ribS2/ribA1 and ribS2/ribA2. It was therefore 
35 assumed not to contain rib gene, although the amplicon sequence showed 
considerable homology with rib gene (71.4% or 66.6% according to wnetner or 
not the primer sequences were included) (Figure 3). This isolate was the only 
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one, of 224 tested, for which PCRs were negative using ribS2/ribA1 and 
ribS2/ribA2 but positive using ribS1/ribA3. The latter primer pair is assumed to be 
not entirely specific for rib gene and was therefore used only for sequencing. 

Four of 10 amplicons of primer pair GBS1360S/GBS1937A (targeting bac 

5 gene) were identical with the corresponding sequence in GenBank (X58470, 
X59771). A single point mutation (A to G, 1441 of X59771) was found in the 
remaining six bac gene amplicons, including the one which contained the 
insertion sequence \S1381 (see above and AF367974). 

Amplicons from all of the 224 isolates that gave positive PCR results using 

10 primer pairs bcaS1/baIA (targeting alp2lalp3 genes), bal23S1/bal2A2 (targeting 
alp2 gene) and IgAagGBS/RlgAagGBS (targeting bac gene) were sequenced. 

Fifty isolates produced amplicons using primer pair bcaS1/balA. The 
sequences of nine were identical with the corresponding portions of the published 
sequence of a/p2 gene (AF208158) and 41 with that of alp3 gene (AF291065). 

15 There are two consistent heterogeneity sites between alp2 and alp3 genes in the 
sequences of bcaS1/baIA amplicons (Figure 4), which can be used to distinguish 
them, in addition to alp2 and alp3 gene -specific PCR. All nine amplicons of 
primer pair bal23S1/bal2A2 were identical with the corresponding portion of the 
alp2 gene sequence in GenBank (AF208158). 

20 The primer pair IgAagGBS/RlgAagGBS identified bac gene in 52 isolates. 

There was considerable sequence variation, which allowed separation of bac 
gene -positive isolates into 11 groups and 20 subgroups based on amplicon 
length and sequence heterogeneity, respectively (Table 8). The groups contained 
small numbers (one to five) of isolates except for B1 (20 isolates, 2 subgroups) 

25 and B4 (11 isolates, 3 subgroups). The differences in amplicon length was 
generally caused by the presence or absence of short repetitive sequences. 

Further confirmation of specificity of surface protein gene-specific primer 
pairs. 

30 To confirm primer specificity, we compared the results of PCR using the 

primer sequences we had designed or modified for bac gene PCR, with those of 
PCR using previously published primers and found 100% correlation. 

The previously reported non-specificity of the published primer pair 
bcaRUS/bcaRUA (targeting the bca gene repetitive unit) was confirmed. Using 

35 these primers, all nine alp2 gene positive (bcaS1/bcaA negative) isolates and 53 
which were PCR negative using the primers bcaSl/bcaA, bcaS2/bcaA (targeting 
the 5'-end of bca gene), bal23S1/bal2A2 and ba!23S2/ba!2A1 (targeting the 5'- 
end of alp2 gene) produced amplicons. Our sequencing showed that bca gene 
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and a/p2 gene have significant homology in the regions targeted by bcaRUS/ 
bcaRUA allowing amplicon formation from a/p2 gene -positive strains. These 
false positive results could be due to the presence of other C alpha-like proteins, 
containing regions homologous with the bca gene repetitive unit (oca gene 

5 repetitive unit-like sequence). 

We also showed that the results of PCR using two or more primer pairs that 
we had designed for individual genes {rib, alp2, and a/p3 genes) correlated well, 
supporting the specificity of each set. The only exception, as mentioned above, 
was ribS1/ribA3, which produced a non-specific amplicon from one of 224 

10 isolates tested. 

Example 6 - The relationship between surface protein antigen gene profiles 
and cps serotypes/serosubtypes. 

Surface protein gene profiles. 

15 For each gene (except bca gene repetitive unit or oca gene repetitive unit- 

like region), we selected two primer pairs to identify and characterise GBS 
surface protein by PCR. Each isolate was given a protein gene profile code 
according to PCR results as follows: 

"A": 5'end of bca gene amplified by bcaS1/bcaA and bcaS2/bcaA; 
20 "a" or "as": bca gene repetitive unit or bca gene repetitive unit-like region 

amplified by bcaRUS/bcaRUA, with multiple or single band amplicons, 
respectively; 

-B": bac gene amplified by GBS1360S/GBS1937A and 
IgAagGBS/RlgAagGBS (>20 subgroups based on sequence 
25 heterogeneity). 

"R": rib gene amplified by ribS2/ribA1 and ribS2/ribA2; 
"alp2": a/p2 gene amplified by bal23S1/bal2A2 and ba!23S2/bal2A1 and 
"alp3": alp3 gene amplified by bal23S1/ba!3A and bal23S2/bal3A 
(Table 7). 

30 Four common profiles accounted for 203 of 224 (90.6%) isolates: »R" (62 

isolates), "AaB" (51 isolates), "a" (49 isolates) and "al P 3" (41 isolates) (see 
Table 4) Only two isolates contained no surface protein gene markers. All but 
one isolate with the bac gene ("B") also had bca gene, with its repetitive unit 
("Aa7 one had rib gene. All "alp2" isolates contained single bca repetitive unit- 

35 like sequences ("as"). "A", "R", "alp2" and "al P 3" were all mutually exclusive. 62 of 
63 isolates with rib gene ("R") and 41 of 41 isolates with al P 3 gene had no other 
protein antigen markers. 
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The relationship between surface protein antigen gene profiles and cps 
serotypes/serosubtypes. 

A cps molecular serotype (MS) was assigned to all isolates in accordance 
5 with the methods described in Examples 1 to 4 and the results correlated with 
conventional serotyping (CS) results except for 19 of 224 isolates that were 
nontypable using antisera. The relationship between surface protein gene profiles 
and cps MS are summarised in Table 9. 

The following strong associations were confirmed or demonstrated 
10 between: MS la and bca gene repetitive unit or bca gene repetitive unit-like 
sequence (most with profile °a n ); MS serosubtypes 111-1 and III-2 and rib gene; MS 
serosubtype III-3 and a/p2 gene; MS lb and bca/bac genes and MS V and a/p3 
gene. MS II showed the most varied surface protein gene profiles. However, the 
relationships were not absolute and different combinations of cps serotypes and 
15 protein gene profiles produced 31 different serovariants or 51 when bac gene 
("B") subgroups were considered. 

Example 7 - The relationship between surface protein antigens and protein 
gene profiles. 

20 Based on conventional serotyping, 33 isolates (belonging to CS la/c, Ib/c, lie, lib, 
lllc or 1Mb) reacted with the C antiserum. The surface protein gene profiles of all 
these isolates contained bca gene ("A") or bca gene repetitive unit-related 
markers ("a" or u as n ): Aa, 3; AaB, 18; a, 11; alp2as,1. Twenty nine isolates 
reacted with the R antiserum and, of these, 22 contained rib gene and six, alp3 

25 gene. The strain used to raise the R protein antiserum (Prague 25/60) contained 
a presumed r/b-Iike gene (see above and Figure 3). 

Example 8 - Identification of mobile genetic elements suitable for molecular 
subtyping 

30 We developed a series of PCR primers to screen for the presence of five 

mobile elements in GBS serotypes. 

Specificity of primers pairs. 

All the primer pairs produced amplicons of the expected lengths (Table 11) 
35 from some reference and/or some clinical isolates (Table 12). To evaluate the 
specificity of our primer pairs, we sequenced all amplicons produced by primers 
IS1548S/IS1548A3 and ISSa4S/ISSa4A2, and amplicons, selected from both 
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reference and clinical isolates, produced by IS861S/IS861A2 (12 isolates), 
IS1381S1/IS1381A (24 isolates) and GBSi1S1/GBSi1A2 (11 isolates). 

All 41 IS 1 548 and 15 ISSa4 amplicon sequences were identical with the 
corresponding sequences in GenBank (Y14270 and AF165983, respectively). 
5 Five of 12 \S861 amplicon sequences were identical with the corresponding 
\S861 sequence in GenBank (M22449). The other seven differed, at position 732, 
from the published sequence (G to A) and the reference strain Prague 25/60 had 
two additional differences - G to A and T to A - at positions 576 and 830 of 
M22449, respectively. 
10 Previously, we found a full-length insertion sequence IS1381 (AF367974) 

within C beta antigen gene of a clinical isolate, with several differences compared 
with the original published sequence (AF064785): the terminal inverted repeats 
contained 15, rather than 20 base pairs (bp); there was a three bp deletion and 
four individual bp differences in the putative transposase pseudogene between 
15 positions 419 to 429 (of the original GenBank sequence) - GGG ATC CGA TT 
(AF064785) vs CAG A- -GG TA (AF367974; our sequence). All amplicons of 
primer pair IS1381S1/IS1381Afrom 12 reference and 12 selected clinical isolates 
were identical with each other and with that of our IS 1 381 sequence in GenBank 
(AF367974) but different, as above, from the original reported IS* 38* sequence 
20 (AF064785). 

The amplicons of primer pair GBSi1S1/GBSi1A2 from all four GBSrt- 
positive reference strains and seven selected clinical isolates were sequenced. 
Six (including those of three reference strains) were identical with the 
corresponding GBSil sequence in GenBank (AJ292930). Amplicons from four 

25 clinical isolates showed three site-variations (C to T at position 767, A to C at 
position 846 and T to C at position 923 of AJ292930 sequence). The reference 
strain Prague 25/60 showed only the first two of these site-variations. 

In addition to sequencing, we evaluated the specificity of our primer pairs 
by comparing PCR results for two or more primer pairs for each target (Table 11). 

30 In all cases, the same sets of isolates gave positive results when tested with PCR 
targeting the same mobile genetic elements, thus confirming the specificity of the 
primer pairs. 



35 



PCR results using specific primer pairs for all five mobile genetic elements. 

IS867, \S1548, IS 7387, ISSa4 and GBSil were identified in 55%, 18%, 85%, 
7% and 19% of isolates, respectively. None of the mobile elements was detected in 
10 (4%) isolates. The distributions of the five mobile elements identified by PCR in 
the 224 GBS isolates tested in the previous examples are shown in Table 12. \S1381 
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was detected alone in 79 isolates and GBSil alone in one. Forty-six isolates 
contained two different insertion sequences {\S861 and \S1381, 42 isolates ; \S1548 
and \S1381 t three isolates; ISSa4 and \S1381, one isolate). Forty-four isolates 
contained three (\S861, \S1548 and \S1381 34; \S861, ISSa4 and \S1381, 10) and 
5 one contained all four insertion sequences. Forty-one isolates contained GBSil in 
combination with one (\$861 t 22; \$1381 t one isolate) two (\S861 and \S1381 t 11; 
\§Sa4 and \S1381 t three isolates) or three {\S861, \S1548 and \S1381 , four isolates) 
insertion sequences. 

10 PCR results for the 194 invasive isolates using specific primer pairs for all 
five mobile genetic elements - . 

The numbers of isolates containing different mobile genetic elements (mge) 
combinations (from none to four per isolate) are shown in Table 13. IS1381, IS861, 
IS1548, ISSa4 and GBSil were identified in 87%, 52%, 17%, 6% and 18% of 
15 isolates, respectively. Six (3%) isolates contained no mge. 

Example 9 - The relationships between cps serotypes, serosubtypes, surface 
protein gene profiles and mobile genetic elements. 

The distribution of each of the five mobile genetic elements in different cps 
20 serotypes, serotype III subtypes and surface protein gene profiles are shown in 
Tables 12 and 13. The most consistent findings for each sero/serosubtype were: 

1) Serotype la - most (>80%) expressed proteins that closely related with C alpha 
protein and contained IS1381 

2) Serotype lb - most (>90%) expressed C alpha and C beta proteins and 
25 contained IS861 and IS1381 

3) Serotype II - exhibited two common patterns: 

a) >50% expressed C alpha protein (and often C beta) and contained IS861, 
IS1381 and sometimes other mobile elements, especially ISSa4 or 

b) >25% expressed Rib protein and contained IS861, IS1381 and GBSil 

30 4) Serosubtype 111-1 - all expressed Rib protein and contained IS861 , IS1 548 and 
IS1381 but not GBSil. 

5) Serosubtype 111-2 - all expressed Rib protein and contained IS861 and GBSil 
but neither IS1548 nor IS1381. 

6) Serosubtype III-3 - all expressed C alpha-like protein 2 and contained no 
35 mobile genetic elements. 

7) Serosubtype III-4 - expressed various proteins; all contained GBSil . 
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8) Serotype IV - most expressed proteins that closely related with C alpha protein 
and contained IS1381 

9) Serotype V - most expressed C alpha-like protein 3 contained IS1 381 

10) GBSil and IS1548 were mutually exclusive in serotype III (111-1, IH-2 and III-4) 
5 but not in serotype II. 

11) All isolates that expressed C alpha-like protein 2 contained no insertion 

sequences. 

Predominant relationships between MS/sst, pgp and mge. 

10 Figure 5 shows the relationships between the various genetic markers. IS1 381 

was present in nearly all isolates of MS la, lb, IV, V and VI, but in none of sst III-2 or 
III-3. IS1548 was found exclusively, and GBSil most commonly, in serotypes II or III; 
three isolates (all MS II) contained both GBSil and IS1548. IS861 was found in all sst 
MM and III-2 and most MS II and lb isolates but only in 14% of other MS isolates. 

15 ISSa4 was present in only 6% of isolates, more than half of which were MS II; it was 
present in one invasive isolate obtained before 1996 (1994). IS1381 was found in 
most isolates except those in cluster 8, pgp "alp?, which had no insertion sequences. 
IS861was found in most genotypes with pgp 'AaB" (clusters 3 and 4) and all 
genotypes with pgp "R" (clusters 6 and 7). 

20 

Genotypes based on MS/sst, pgp, bac subtypes and mge. 

MS/sst, pgp, bac subtype (for isolates with pgp "B") and the presence of 
various combinations of mge provide a PCR/sequencing-based genotyping system. 
The 194 invasive isolates in this study represented seven serotypes, ten MS/sst, 41 
25 subtypes based on the distributions of pgp and mge or 56 genotypes when bac 
subtypes (mainly in MS lb) were included (Figure 5). 

Theoretical GBS clonal population structure. 

Theoretically there are 13 possible GBS MS/sst (eight MS - la, lb, II, IV-VIII, 
30 four sst III 1-4 and cps gene cluster absent) and at least 10 pgp (none, u Aa", "AaB", 
"a", "as", "R", "RB", "alp2as", "alp3" or "alp4a"). If the 22 bac subgroups identified so 
far are included, there are up to 31 pgp. If the five mge were independently, randomly 
distributed and present or absent, there would be 13x31x2 5 = 12,896 different possible 
combinations of molecular markers. The fact that only 56 different combinations were 
35 found (Figure 5). demonstrates that markers are not randomly distributed or, in other 
w ords, these invasive Australasian GBS isolates have a clonal population structure. It 
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is possible, but unlikely, that these isolates represent a very limited number of GBS 
genotypes. 

The phylogenetic relationship of Australasian invasive GBS. 

5 The 56 genotypes formed eight clusters, separated at a genetic distance of 

about -16 (or three cluster groups separated at a distance of -22.5). The pgp 
was the main determinant of cluster separation (Figure 5). 94% of isolates 
belonged to five MS (la, lb, II, III and V), 62% belonged to five (9%) genotypes 
(la-1, lb-1, MM, III-2, V-1) and 92% belonged to the five largest clusters (1, 2, 4, 6 

10 and 7). Cluster group A, the largest, contained 139 (72%) isolates and 48 (86%) 
genotypes, 45 of which contained fewer than five isolates, whereas cluster group 
B contained 49 (25%) isolates and five (9%) genotypes. 

The main characteristics of each cluster were as follows: 
Cluster 1. tt alp3", IS1381 (39 isolates, four MS, 11 genotypes; predominant 

15 genotype V-1). 

Cluster 2: "a* or "as", IS1381 (55 isolates, four MS, 12 genotypes, predominant 
genotype la-1). 

Clusters "Aa" or "AaB", MS II, IS1381, IS 861 (10 isolates, six genotypes). 
Cluster 4: "AaB", IS1381, IS861 (35 isolates, two MS: VI or lb; 18 genotypes; 
20 predominant genotype lb-1 ). 

Cluster 5. "AaB", IS861, GBSil, genotype 111-4-1 (one isolate). 

Cluster 6: U R", IS861 and GBSil (22 isolates, three MS/genotypes; predominant 

genotype III-2). 

Cluster 7: tt R\ IS1381 and IS861 (27 isolates; two MS/genotypes; predominant 
25 genotype MM). 

Cluster 8: "a^as", no IS (six isolates; three MS/genotypes; one contained 

GBSil). 

The phylogenetic study showed that the dendrogram inferred by SSPS 
was very robust. 

30 

The relationship between genotypes and GBS disease patterns. 

The distribution of MS and genotypes in different age groups of patients with 
invasive GBS disease is shown in Table 14. All common MS were represented in 
more than one patient group. However, there were highly significant associations 
35 (when compared with ail other age-groups) between sst III-2 and late onset neonatal 
infection (p=0.0005) and MS V and infection in the elderly (p=0.001). 
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There were 17 isolates from cerebrospinal fluid specimens, nine (53%) of 
which were MS III (from three different sst/genotypes, each in a different cluster). The 
other eight isolates were distributed among five MS, seven genotypes and four 
clusters. Meningitis occurred in all age-groups but comprised 23% of cases in the late 
onset neonatal group compared with 5% in all other groups. 



DISCUSSION 

Capsule production in GBS is controlled by capsular polysaccharide 
synthesis (cps) gene cluster, which had been sequenced for serotype la and 
10 serotype III before we began our study. Corresponding sequences for serotype lb 
(Miyake et ai, 2001 submitted into GenBank, GenBank accession number. 
AB050723), and for serotypes IV, V, and VI (McKinnon et ai, 2001 submitted into 
GenBank, GenBank accession numbers: AF355776, AF349539, AF337958, 
respectively) were released recently when the project was nearly finished but 
15 those for the other three serotypes (II, VII and VIII), the sequences of cps gene 
clusters, have not been published previously. 

The sequences of cps gene clusters for serotypes la, and III showed 
considerable homology at the 3'-end of cpsD-cpsE-cpsF-and the 5'-end of cpsG. 
We designed a series of primers to amplify a 2226/2217 bp segment in th.s 
20 region and found that amplicons were obtained from all serotypes except VIII. 
This confirmed a previous suggestion that serotype VIII is significantly different 
from other serotypes in this region. 

Using eight serotype (la to VII) reference strains, we showed more than 50 
heterogeneity points between serotypes (Figure 1, Table 4). Using 63 selected 
clinical isolates that had been serotyped by conventional methods, we found that 
these inter-serotype differences were generally consistent and specific, especially 
the 23 sites clustered at the 3'-end of the regions. We used these differences to 
assign serotypes to the remaining clinical isolates collected in this study, without 
knowledge of the serotype obtained by conventional methods. 

Sequence analysis of the 3'-end of cpsG-cpsH-cpsl/cpsM for serotypes la, 
III lb, IV, V and VI showed that this region is highly variable (Figure 3), making 
this region a suitable target for direct serotype identification by PCR. We 
designed several pairs of MS-specific primers for MS la, lb, III, IV, V and VI and 
used them to test two CS reference panels. Selected primer pairs were used for 
MS by PCR alone, of 86.9% of our 206 clinical isolates. Using rapid-cycle MS- 
specific PCR, results are available within one working day. In future, it will be 
possible to extend this method to all MS, when cps gene cluster sequences .n 
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this region are available for serotypes II, VII and VIII. Meanwhile, MS II and VII 
can be identified by sequencing the 790 bp PCR amplicons of the 3'-end of cpsE- 
cpsF-ihe 5*-end of cpsG (Figure 1, Table 4). A positive GBS-specific PCR and 
negative PCR results with all the primers that amplify the 790 bp, identified MS 
5 VIII, by exclusion. 

In future, and in some laboratories currently, sequencing of the 790 bp 
PCR amplicons of the 3'-end of cpsE-cpsF-ihe 5'-end of cpsG for all isolates may 
be more convenient, as only one method and fewer primers are needed. 
However, if sequencing is not available in-house, the turn-around time is longer 
10 and a small proportion of serotypes would be wrongly assigned (serosubtypes III- 
3 and 111-4 as MS la and II, respectively). This could be avoided by screening with 
MS Ill-specific PCR first. Sequencing the 790 bp PCR amplicon, allows MS III to 
be subtyped on the basis of the sequence heterogeneity. 

Previous studies have shown that serotypes la, lb, II, III, and V are those 
15 most frequently isolated from normally sterile sites, in the United States and 
several countries. Serotypes VI and VIII are the predominant serotypes isolated 
from patients in Japan, but are uncommon elsewhere. Although our isolates were 
selected, they were probably representative of those causing disease in 
Australasia; la, lb, II, III, and V were the most common serotypes identified, 
20 although there were small numbers of serotypes IV, VI and, VIII. 

Up to 13 % of GBS isolates are non-serotypable and in our study the 
proportion was 8.7% (18/206) using the antisera available. This may be due to 
decreased type-specific-antigen synthesis; non-encapsulated phase variation; or 
insertion or mutation in genes of cps gene clusters. One non-serotypable strain 
25 GBS in our study had a T base deletion in cpsG gene, which caused a change in 
the cpsG gene reading frame. 

We have also developed PCR-based methods to identify GBS surface 
protein genes and further characterise these isolates. Using the published bac 
gene sequence, we modified bac gene-specific primers and designed new 
30 primers, with high melting temperatures (>70 °C) suitable for rapid cycle PCR 
targeting all major surface protein genes. 

As previously reported, a published PCR primer pair targeting the bca 
gene repetitive unit (at the 3'-end of bca gene), was not entirely specific for bca 
gene. We designed two new primer pairs targeting the 5'-end of bca gene, to 
35 improve the specificity. However, very few serotype la strains gave positive 
results using these primers whereas all were PCR positive using primers 
targeting the bca gene repetitive unit. These results were consistent with a 
previous report, that a probe targeting the 5'-end of bca gene hybridized with only 
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one of nine serotype la strains, but a large bca gene probe, including the tandem 
repeat region, hybridized with all nine strains. 

PCR specific for rib, alp2 and a/p3 genes has not been described 
previously. The primer pairs we designed mainly targeted the 5'-ends of the gene 
5 and were chosen after comparing the gene heterogeneity with related gene 
sequences. We designed two or more primer pairs for each gene to check primer 
specificity by comparison of results of different PCR targeting the same genes. 
Protein gene profiles "alp2" and "alp3"were distinguished on the basis of the alp2 
and a/p3 gene -specific PCR and/or two sequence heterogeneity sites in the 
10 amplicons of bcaS1 /balA, or bcaS2/ balA. 

To confirm the specificity of our primers, we used them to examine two 
reference panels and selected GBS isolates. The longest amplicons produced by 
PCR for each gene were sequenced, to provide maximal sequence information 
and ensure that the inner primers were not located at strain heterogeneity sites. 
15 Our sequencing results confirmed the specificity of the primers. Two pairs of 
primers for each gene were compared, with similar results. Finally, six 
gene/region specific primer pairs (including the one targeting the bca gene 
repetitive unit) were used to define protein antigen gene profiles for all 224 
isolates. 

20 The study showed that only one member of the surface protein gene family 

containing repetitive sequences - rib, bca, alp2, and al P 3 genes-could be present 
in any single isolate. However, all isolates containing bac gene, which is not a 
member of the surface protein gene family containing repetitive sequences, also 
contained either oca gene (51 152) or rib gene (1 /52). 

25 Bac gene was present in 23% of isolates, a similar proportion to that (19- 

22%) previously reported. In common with others, we found variations in the bac 
gene due to variable small internal repetitive sequences. These bac gene 
repetitive sequences were irregular (unlike those of the bca-rib gene family). 
Their role is not clear, but they are potentially useful molecular markers for 

30 epidemiological studies. 

Our data show that some serotype III isolates (our MS serosubtypes 111-1 
and III-2) were closely associated with rib gene, and others (our MS serosubtype 
III-3) with a/ P 2 gene. Serotype lb was associated with bca and bac genes ano 
serotype V with al P 3 gene. However, as the relationship was not absolute, 

35 different combinations of cps serotypes-serosubtypes/protein gene profiles 
identified many serovariants, which will be useful in epidemiological stud.es and 
in formulation of conjugate vaccines. Based on PCR only, we were able to d.v.de 
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our 224 isolates into 31 serovariants based on bac gene (B) groups or 51, based 
on subgroups. Theoretically, there are likely to be additional serovariants. 

We found that the antisera to u c n and "R w protein antigens were not entirely 
specific for any particular protein genes. However, reaction with V antiserum 

5 mostly reflected the presence of genes encoding C alpha {bca gene) and related 
protein antigens (at least including a/p2 gene) and the antiserum to M R" with those 
encoding Rib (rib gene) and related proteins (at least including a/p3 gene, and the 
rare presumed r/6-like gene). 

We have also investigated the presence of a number of mobile element in 

10 different serotypes of GBS. Four different insertion sequences have been 
identified previously in GBS. Multiple copies of \S861 in some serotype III 
isolates were associated with increased capsule gene expression. We found 
\S861 in all serosubtypes 111-1 and III-2 and most serotype II and lb isolates but 
few others. All IS86f-containing isolates contained at least one additional mobile 

15 element. 

Multiple copies of \S1381 have been found in a high proportion GBS and 
other Streptococcus species, including S. pneumoniae and used as probes for 
restriction fragment length polymorphism (RFLP) analysis of GBS for 
epidemiological studies (Tamura et al., 2000). We found \S1381 in 85% of 

20 isolates overall. They were present in all isolates of serosubtype III-1 but none of 
serosubtypes III-2 or III-3. Our \S1381 sequences, from 24 isolates, were identical 
with each other, but differed at several sites, from that previously described 
(AF064785). The significance of these differences is unknown, but it emphasizes 
the importance of confirming sequences from as many different strains as 

25 possible. 

ISSa4 was first identified in a nonhemolytic GBS isolate, in which it caused 
insertional inactivation of the gene cylB, which is part of an ABC transporter 
involved in production of hemolysin. Only a small proportion of (mainly hemolytic) 
GBS isolates (4%) contained !SSa4, all of which had been isolated since 1996 
30 and it was postulated that ISSa4 had been newly acquired by GBS. We also 
found ISSa4 in only a small proportion of isolates (7%) but it was present in 
similar proportions of clinical isolates obtained before (4 of 44) and during or after 
(11 of 162)1996. 

\S1548 was first discovered in some hyaluronidase-negative GBS 
35 serotype 111 isolates, in which it caused insertional inactivation of the gene hylB 
(one of a cluster responsible for production of hyaluronidase, an important GBS 
virulence factor) (Granlund et al., 1998). A copy of IS 1548 is also found 
downstream of the C5a peptidase gene (also associated with virulence), in 
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isolates that contain it. Most IS?548-containing isolates were from patients with 
endocarditis and it was postulated that inactivation of hyaluronidase production 
and/or some effect on C5a peptidase may allow GBS isolates to adhere to and 

survive on heart valves. 
5 We found \$1548 in all serosubtype 111-1 isolates, which represented 52% 

of 58 serotype III isolates in our collection, from superficial (eight of 12) and 
normally sterile (22 of 46) specimens. The latter were from neonates (seven of 
20), adults (three of six) and subjects of unspecified age (12 of 20) (data not 
shown). Although specific clinical data were unavailable, GBS endocarditis is 
10 uncommon and likely to have been present in few, if any, of these subjects. 
Further study is required to elucidate the association with this insertion sequence 
with specific virulence factors and clinical syndromes. 

We found GBSil, a group II intron, in 19% of our 224 isolates overall; it 
was commonly associated with \S861, and the distribution varied w.th 
15 serotype/serosubtype. It was rarely found in serotypes other than II and III. It was 
present in more than 50% of serotype II isolates, including four, which also 
contained \S1548. It was found in all serosubtypes III-2 and III-4 isolates, in 
which \S1548 was not found, but in no serosubtype IIM isolates which did 
contain \S1548 or serosubtype III-3 isolates which did not. 

Our subdivision of GBS serotype III into four serosubtypes, based on 
differences within the cps gene cluster was supported by corresponding 
differences in surface protein gene profiles and distribution of the five mobile 
elements described in this study. Although we did not test our isolates for 
hyaluronidase activity, it is likely that our serosubtype IIM. which expresses R.b 
protein and contains \S1548, \S861 and \S1381, corresponds with the 
hyaluronidase negative subtype III-2, described by Bohnsack et al., 2001. Our 
serosubtype III-2 also expresses Rib protein and contains IS867 and GBSH and 
probably corresponds with subtype III-3 of Bohnsack et al., 2001. Serosubtypes 
HI-3 and 111-4 were represented by relatively few isolates. The former (in common 
with some serotype la isolates) expressed the C alpha-like protein 2 and 
contained no mobile elements (an otherwise uncommon finding). The latter is 
closely related to serotype II, with which it shares sequence homology in a 
section of the cps gene cluster and various surface protein profiles and mob.le 
elements. 
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Summary 

Our aim has been to develop a comprehensive genotyping system for group B 
streptococcus (GBS). Such a system should ideally be reproducible, objective and 
transportable between laboratories, comparable with and complementary to other 

5 typing methods and able to incorporate known virulence markers. Based on these 
criteria, we first developed a molecular serotyping (MS) method based on the cps 
gene cluster. It compared favourably with, but was more sensitive than, conventional 
serotyping (CS) and allowed us to identify several subtypes of serotype (sst) III, as 
described by others. We have also developed a second molecular subtyping method 

10 based on the family of genes encoding variable surface protein antigens 
{bca/rib/alp2/alp3/a!p4) and the IgA binding protein C beta (bac), is more sensitive 
and objective than conventional protein serotyping, which cannot type all isolates and 
is sometimes misleading. Our methods also can identify more members of the family 
of variable antigen genes and distinguish numerous bac subgroups. A third 

15 subtyping method uses five mobile genetic elements (mge) including four different 
insertion sequences (IS) and a type II intron, which have been identified in GBS. The 
use of this third method further enhances the discriminatory ability of our genotyping 
system. 

We then used our typing system to examine the population genetic structure 
20 and age-related disease distribution of genotypes among 194 invasive GBS isolates. 

We used mainly invasive GBS isolates to demonstrate the practical value of 
our genotyping system, confirm their clonal population structure and determine the 
distribution of genotypes in different patient groups. The isolates originated from 
patients of all ages with GBS sepsis. About half were consecutive GBS isolates from 
25 blood or CSF, at a large diagnostic laboratory in a general adult hospital, with an 
obstetric unit (i.e there were no isolates from children other than neonates). The rest 
were consecutive isolates referred for serotyping from all over New Zealand. Thus the 
overall age distribution is representative of that in the population affected by GBS 
disease, except that children beyond the early neonatal period are probably under- 
30 represented. However, the distribution of genotypes within each age-group should be 
representative. 

Among our 194 Australasian invasive GBS isolates we identified 56 
genotypes, of which five (la-1 , lb-1 , 111-1 , III-2 and V-1 ) accounted for 62% of isolates. 

The phyiogenetic tree derived trom our results showed relationships between 
35 cps serotype and protein gene profiles (pgp). Our results also show that certain 
known virulence markers - C beta, C alpha variants and hyaluronidase production 
(indirectly) - were associated with distinct clonal lineages. 
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Our genotyping system, based on three sets of genetic markers, is highly 
discriminatory. Because it provides useful phenotypic data, including antigenic 
composition, it will be useful for epidemiological surveillance of GBS, especially in 
relation to potential GBS vaccine use. Study of the relationships between 
5 putative high-virulence genotypes and patient characteristics (age and/or 
underlying risk factors), and whether there are significant differences between 
CSF isolates (or genotypes) and other invasive or colonising strains, will be 
facilitated by our genotyping system. Using this system, we have demonstrated a 
clonal population structure among invasive Australasian GBS isolates. This 
10 system will be applied to colonising GBS isolates, to identify markers of virulence. 

Thus, we have developed an alternative to conventional serotyping for 
GBS, which is accurate and reproducible, can be performed by any laboratory 
with access to PCR/sequencing and, importantly, does not require panels of 
serotype-specific antisera that are increasingly difficult to maintain. All isolates 
15 are serotypable and sequencing of a relatively limited 790 bp region can provide 
additional serosubtyping information for MS III. The molecular methods we have 
described for serotype identification, together with the protein profiling (or protein 
antigen subtyping) and identification of mobile genetic elements (or mobile 
genetic elements subtyping) provide potentially useful markers for further 
20 phylogenetic and epidemiological studies of GBS as well as comprehensive strain 
identification that will be useful for epidemiological and other related studies that 
will be needed to monitor GBS isolates before and after introduction of GBS 

conjugate vaccines. 

The various features and embodiments of the present, referred to in 
25 individual sections above apply, as appropriate, to other sections, mutatis 

mutandis. Consequently features specified in one section may be combined with 

features specified in other sections, as appropriate. 

All publications mentioned in the above specification are herein 

incorporated by reference. Various modifications and variations of the described 
30 methods and system of the invention will be apparent to those skilled in the art 

without departing from the scope and spirit of the invention. Although the 

invention has been described in connection with specific preferred embodiments, 

it should be understood that the invention as claimed should not be unduly limited 

to such specific embodiments. Indeed, various modifications of the described 
35 modes for carrying out the invention which are readily apparent to those skilled in 

molecular biology or related fields are intended to be within the scope of the 

following claims. 
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Table 1. GBS reference panels used in this study. 



Lab strain number Source Serotype 



MS/ 

serosubtype 



GenBank 

accession 

numbers 



Reference panel 1 1 






la 


090 


Charming 


la 


H36B 


Charming 


lb 


lb 


18RS21 


Channing 


II 


II 


M781 


Channing 


III 


III-2 3 


3139 


Channing 


IV 


IV 


CJB111 


Channing 


V 


V 


SS1214 


Channing 


VI 


VI 


7271 


Channing 


VII 


VII 


iMQ 130013 


Channing 


VIII 


VIII 


Reference panel 2 2 






la 


NZRM 908 


ESR 


la 


(NCDC SS615) 






lb 


N7RM 909 


ESR 


lb 


(NCDC SS618) 






la 


N7RM 910 


ESR 


Ic 


(NCDC SS700) 




II 




M7DM Q1 1 


ESR 


II 


(NCDC SS619) 








NZRM 912 


ESR 


III 


lll-3 : 


(NCDC SS620) 








NZRM 2217 


ESR 


Non-typable 


II 


(Prague 25/60) 




(R) 


IV 


NZRM 2832 


ESR 


IV 


(Prague 1/82) 








NZRM 2833 


ESR 


V 


V 


(Prague 10/84) 






VI 


NZRM 2834 
/Prague 118754) 


ESR 


VI 



AF332893 
AF332903 
AF332905 
AF332896 
AF332908 
AF332910 
AF332901 
AF332913 



AF332894 

AF332904 

AF332914 

AF332906 

AF332897 

AF332907 

AF332909 

AF332911 

AF332902 
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Notes. 

1. Reference panel 1: supplied by Dr Lawrence Paoletti, Channing Laboratory, 
Boston, USA. 

2. Reference panel 2: New Zealand Reference Medical Culture Collection strains 
supplied by Dr Diana Martin, ESR, Porirua, Wellington, New Zealand. 

3. MS III serosubtypes based on sequence heterogeneity; see text for more 
detail 
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Table 3. Specificity and expected lengths of amplicons of using different 
oligonucleotide primer pairs. 



Primer pairs* . Specificity Length of amplicons (base 

; ; E§in) 



Sag59/Sag190 a 


GBS (S. agalactiae) 


196 


CFBS/CFBA 


GBS (S. agalactiae) 


241 


16SS/23SA 


GBS (S. agalactiae) 


433 


DSF2/DSR1" 


GBS (S. agalactiae) 


276 


cpsDS/cpsEM 


serotypes la to VII 


449/458 


cpsES/cpsEA2 


serotypes la to VII 


424 


cpsES1/cpsEA3 


serotypes la to VII 


505 


cpsES2/cpsEFA 


serotypes la to VII 


515 


cpsES3/cpsFA b 


serotypes la to VII 


450 


cpsFS/cpsGA1 b 


serotypes la to VII 


423 


cpsES3/cpsGA1 b 


serotypes la to VII 


790 


cpsGS/cpsIA 


serotypes la and III 


1672/1558 


cpsGS1/cpslA 


serotypes la and III 


1662/1548 


cpsGS/lacpsHA1 


serotype la 


1127 


cpsGS1/lacpsHA1 


serotype la 


1117 


lacpsHS/lacpsHA 


serotype la 


296 


lacpsHS/lacpsHAI 


serotype la 


574 


lacpsHS1/cpslA c 


serotype la 


354 


cpsGS/lbcpsHA1 


serotype lb 


1468 


cpsGS1/lbcpsHA1 


serotype lb 


1458 


cpsGS/lbcpsIA 


serotype Jb 


1660 


cpsGSI/lbcpsIA 


serotype lb 


1650 


IbcpsHS/lbcpsHA 


serotype lb 


282 


lbcpsHS1/lbcpsHA1 


serotype lb 


349 


lbcpsHS2/lbcpslA 


serotype lb 


347 


lbcpslS/lbcpslA1 c 


serotype lb 


523 


cpsGS/lllcpsHA 


serotype III 


1063 


cpsGSI/lllcpsHA 


serotype III 


1053 


IIIVIcpsHS/lllcpsHA 


serotype 111 


543 


MIcpsHS/cpsIA 0 


serotype III 


641 


cpsGS/IVcpsHA 


serotype IV 


1372 


cpsGSI/IVcpsHA 


serotype IV 


1362 


cpsGS/IVcpsMA 


serotype IV 


1686 
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cpsGSI/IVcpsMA 
IVcpsHS/IVcpsHA 
IVcpsHS1/IVcpsMA c 
cpsGS/VcpsHA1 
cpsGSWcpsHAI 
cpsGS/VcpsMA 
CpsGSWcpsMA 
VcpsHSA/cpsHA 
VcpsHS1/VcpsHA1 
VcpsHS2A/cpsMA c 
HMcpsHSWIcpsHA 
cpsGSMcpsHAI 
cpsGSWIcpsHAI 
cpsGSA/lcpsIA 
cpsGSWIcpslA 
VlcpsHSA/lcpsHA1° 
VIcpsHSI/VlcpsIA 



serotype IV 




serotype IV 


Ann 


serotype IV 


Or 9 


serotype V 




serotype V 


I uoo 


serotype V 


1RR9 


serotype V 


IDf £. 


serotype V 




serotype V 


ACM 


serotype V 


Of H 


serotype VI 


398 


serotype VI 


1205 


serotype VI 


1195 


serotype VI 


1527 


serotype VI 


1517 


serotype VI 


327 


serotype VI 


360^ 



Notes. 

*See Table 2 for primer sequences and Figure 1 for some primer sites. 
Primers used in Algorithm for molecular serotype identification-Figure 2 
a. to identify GBS, b. for sequencing, c. for MS-specific PCR 
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Table 5. Comparison of the results of conventional serotyping (CS) and 
molecular serotype identification (MS)/subtyping of 206 clinical GBS 
isolates. 



MS/serosubtype 



CS 


la 


lb 


II 


III-1 1 


III-2 1 


III-3 1 


IIMV IV 


V 


VI 


VIII 


la 


38 




















lb 
II 

>■' 




30 


25 


27 


20 


4 


3 








IV 














7 








V 
















31 






VI 


















2 




VIII 




















1 


NT 1 


2 


5 


1 


3 


1 






5 


1 




Total (206) 2 


40 


35 


26 2 


30 


21 2 


4 


3 7 


36 


3 


1 



Notes. 

1 . For details of MS III serosubtypes see text. 

2. One mixed culture was included as two separate isolates (one serotype II, one 
subtype III-2). 
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Table 7. Specificity and expected lengths of amplicons of using different 
primer pairs. 



Primer pairs* Specificity Length of Protein profile 

amplicons code 

' ■ (base pairs) , 

IgAagGBS/ bac 532-838 B 

RlgAagGBS ' 



1 ^_ A C\ A It A A A 

lgAS1/lgAA1 


bac 


303-591 


B 


GBS1360S/ 


bac 


652 


B 


GBS1937A 








GBS1717S/ 


bac 


292 


B 


GBS1937A 








bcaS1/bcaA 


5'-end of bca 


390 


A 

A 


bcaS2/bcaA 


5*-end of bca 


342 


A 


BcaRUS/bcaRUA 


bca repetitive unit/ 


235 


a/as 




bca repetitive unit-like 








region 






bcaS1/balA 


alp2/alp3 


446 


alp2 or alp3 


bcaS2/balA 


alp2/alp3 


398 


alp2 or alp3 


balS/balA 


alp2/alp3 


302 


alp2 or alp3 


ba!23S1/baI2A1 


alp2 


334 


alp2 


bal23S2/ba!2A1 


alp2 


253 


alp2 


bal23S1/bal2A2 


alp2 


426 


alp2 


bal23S2/bal2A2 


alp2 


345 


alp2 


bal23S1/bal3A 


alp3 


321 


alp3 


bal23S2/bal3A 


alp3 


240 


alp3 


#ribS1/ribA3 


rib/rib-like 


355 


R/r 


ribS2/ribA1 


rib 


194 


R 


ribS2/ribA2 


rib 


225 


R 


ribS2/ribA3 


rib 


333 


R 



Notes. 

*See Table 6 for primer sequences. 

#Fpr sequencing use only, not entirely specific for rib gene (see text for more 
detail). 
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Table 8. Genetic groups and subgroups of bac gene (C beta protein gene) 
based on amplicon length (using primers IgAagGBS/RlgAagGBS) and 
sequence heterogeneity. 



Group or 
Subgroup 




Amplicon 
length 


GenBank 

accession 

numbers 


No. of different 
sites compared 
with (c.f.) main 
aroup 


Molecular 
serotype/ 
serosubtypes 


B1 


19 


532 


X58470 




17 = lb; 2 = II 


B1a 


1 


532 


AF362686 


1 (c.f.B1) 


lb 


B2 


3 


550 


AF362687 




lb, II, III-4 


B3 


2 


586 


AF362688 




2=lb 


B3a 


1 


586 


AF362689 


4(c.f.B3) 


V 


B3b 


1 


586 


AF362690 


21 (c.f. B3) 


VI 


B3c 


1 


586 


AF362691 


24 (c.f. B3) 


lb 


B4 


8 


604 


AF362692 




4 = lb; 4= II 


B4a 


1 


604 


AF362693 


1 (c.f. B4) 


II 


B4b 


2 


604 


AF362694 


2 (c.f. B4) 


2 = lb 


B5 


2 


622 


AF362695 




la, VI 


BSa 


1 


622 


AF362696 


2 (c.f . B5) 


la 


B6 


1 


640 


AF362697 




lb 


B7 


1 


658 


AF362698 




lb 


B7a 


1 


658 


AF362699 


34 (c.f. B7) 


VI 


B8 


1 


712 


AF362700 




lb 


B9 


2 


748 


AF362701 




2 = 11 


B9a 


1 


748 


AF362702 


13 (c.f. B9) 


lb 


B10 


2 


820 


AF362703 




2= lb 


B11 


1 


838 


AF362704 




lb 



*See Table 9 for further details of serotype/serosubtype relationships with protein 
antigens. 
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Table 9. The relationship between GBS protein gene profiles and capsular 
polysaccharide (cps) molecular serotypes/serosubty pes. 



Serotype/ 


N= 


None 


Aa 


AaB 


R 


alp 


a 


as 


alp2as 


RB 


R 


serosubtype 
# 












3 










a 


la 


43 






2 






35 


3 


3 






lb 


37 




1 


35 




1 












II 


29 




3 


10 


8 


2 


5 








1 


IIM 


30 








30 














111-2 


22 








22 














111-3 


5 
















5 






IIM 


3 






1 




1 






1 






IV 


9 








1 




8 










V 


38 


1 






1 


35 








1 




VI 


5 




1 


3 






1 


m 


m 






VII 


1 










1 












VIII 


2 


1 








1 












Total 


224 


2 


5 


51 


62 


41 


49 


3 


9 


1 


1 



Note. 

*See text for explanation of cps serosubtypes and Table 7 for explanation of 
protein antigen gene profile codes. 
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Table 10. Oligonucleotide primers used in this study. 



Primer 



Of* 1 



Target Tm°C 



GenBank 

accession 

numbers 



Sequence' 



IS861S \S861 77. A 



IS861A1 



IS861A2 



\S861 76.1 



IS1548A2 
IS1548A3 



IS 1548 70.3 
IS 1 548 78.0 



IS1381S1 IS 1381 80.1 



IS1381S2 \S1381 81.7 



M22449 



\S861 77.3 M22449 



M22449 



IS1548S IS 7548 76.5 Y14270 



IS1548S1 IS 1548 77.0 Y14270 



IS1548A1 \S1548 77.0 Y14270 



Y14270 
Y14270 

AF064785/ 
AF367974 

AF064785/ 
AF367974 



445GAG AAA ACA AGA GGG 
AGA CCG AGT AAA ATG GGA 
CG479 

831CAC GAT TTC GCA GTT 
CTA AAT AAA TCC GAC GAT 
AGC C795 

1020CAA ACT CCG TCA CAT 
CGG TAT AGC ACT TCT CAT 
AGG985 

143CTA TTG ATG ATT GCG 
CAG TTG AAT TGG ATA GTC 
GTC178 

539GTT TGG GAC AGG TAG 
CGG TTG AGG AGA AAA GTA 
ATG574 

574CAT TAC TTT TCT CCT 
CAA CCG CTA CCT GTC CCA 
AAC539 

915CCC AAT ACC ACG TAA 
CTT ATG CCA TTT G888 

930CGT GTT ACG AGT CAT 
CCC AAT ACC ACG TAA CTT 
ATG CC893 

272/81 8CTT ATG AAC AAA 
TTG CGG CTG ATT TTG GCA 
TTC ACG307/853 

497/1 040GGC TCA GGC GAT 
TGT CAC AAG CCA AGG 
GAG526/1069 
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IS1381A 



ISSa4S 



ISSa4A1 



ISSa4A2 



GBSI1S1 



GBSi1S2 



GBSMA1 



GBSi1A2 



\S1381 73.1 



ISSa4 78.5 



ISSa4 75.2 



ISSa4 74.5 



GBSil 78.6 



GBSil 77.3 



GBSil 83.9 



GBSil 80.5 



AF064785/ 881/1424CTA AAA TCC TAG 

AF367974 110 ACG GTT GAT CAT TCC 
AGC849/1392 

AF1 65983 326CGT ATC TGT CAC TTA 
TTT CGC TGC GGG TGT CTC 
C359 

AF1 65983 639GCC GAT GTC ACA ACA 
TAG TTC AGG ATA TAG CCA 
G606 

AF1 65983 780CGT AAA GGA GTC CAA 
AGA TGA TAG CCT TTT TGA 
ACC745 

AJ292930 721 CAT CTC GGA ACA ATA 
TGC TCG AAG CTT ACA AGC 
AAGTG758 

AJ292930 789GGG GTC ACT ATC GAG 
CAG ATG GAT GAC TAT CTT 
CAC824 

AJ292930 1 058AAT GGC TGT TTC GCA 
GGA GCG ATT GGG TCT GAA 
CC1024 

AJ292930 1 1 61 CCA GGG ACA TCA ATC 
TGT CTT GCG GAA CAG TAT 
CG1127 



Notes. 



1 . The primer Tm values were provided by the primer synthesiser (Sigma-Aldrich). 

2. Numbers represent the numbered base positions at which primer sequences 
start and finish (numbering start point "1" refers to the start point "1" of 
corresponding gene GenBank accession number). 
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Table 11. Specificity and expected lengths of amplicons of using different 
oligonucleotide primer pairs. 



Primer oairs* 


Specificity 


Length of amplicons (base 
pairs) 


IS861S/IS861A1 


IS86f 


387 


IS861S/IS861A2 


IS86f 


576 


IS1548S/IS1548A1 


IS 1548 


432 


IS1548S/IS1548A2 


\S1548 


773 


IS1548S/IS1548A3 


\S1548 . 


788 


IS1548S1/IS1548A2 


IS 1548 


377 


IS1548S1/IS1548A3 


IS f 548 


392 


l<51381S1/IS1381A 


ISY38Y 


610/607* 


IS1381S2/IS1381A 


ISJ38* 


385 


ISSa4S/ISSa4A1 


ISSa4 


314 


ISSa4S/ISSa4A2 


!SSa4 


455 


GBSi1Sl/GBSi1A1 


GBSil 


338 


GBSi1S1/GBSi1A2 


GBSil 


441 


GBSh S2/GBSMA1 


GBSil 


270 


GBSi1S2/GBSi1A2 


GBSil 


373 



Notes. 

*See table 1 0 for primer sequences. 

# Our sequencing result (GenBank accession number. AF367974) was 5 
shorter than that previously described by Tamura et al., 2000 (GenBank 
number AF064785). 
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Table 12. Relationship between mobile genetic elements and capsular 
polysaccharide serotypes, serotype III subtypes and surface protein gene 
profiles. 



Serotype/ 
serosubtype 


Protein 
gene 
profile 


N= 


\S861 


\S1548 


\S1381 


ISSa 
4 


GBSM 


No 

mobile 
element 


la 


AaB 


2 


2 




2 


■* 




m 


la 


alp2as 


3 


— 










3 


la 


a 


35 


3 


1 


35 


1 




_ 


la 


as 


3 






3 








subtotal 




43 


5 


1 


40 






3 


lb 


Aa 


1 












1 


lb 


AaB 


35 


30 




35 


1 


— 




lb 


alp3 


1 






1 








subtotal 




37 


30 


m 


36 


1 




1 


II 


Aa 


3 


3 


1 


3 


2 


1 




II 


AaB 


10 


10 


5 


10 


5 


1 


- 


II 


alp3 


2 


1 


1 


2 




_ 


- 


II 


R 


8 


8 


. 


8 




8 


_ 


II 


Ra 


1 


1 








1 


- 


II 


a 


5 


2 


2 


5 


3 


5 


- 


subtotal 




29 


25 


9 


28 




16 




111-1 


R 


30 


30 


30 


30 


1 


. 


- 


III-2 


R 


22 


22 




_ 


_ 


22 


- 


III-3 


alp2as 


5 




— 




_ 


H 


5 


III-4 


AaB 


1 


1 




1 




1 


• 


III-4 


alp2as 


1 




_ 


_ 




1 




III-4 


alp3 


1 






1 




1 




subtotal 




60 


53 


30 


32 


f 


25 


5 


IV 


R 


1 


1 




1 




1 




IV 


a 


8 


2 




8 








subtotal 




9 


3 


m 


9 








V 


alp3 


35 


3 


1 


35 


1 


1 




V 


R 


1 


1 




1 


1 






V 


RB 


1 


1 




1 








V 


none 


1 












1 


subtotal 




38 


5 


1 


37 


f 


f 


2 
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VI 


Aa 


1 


— - 


1 




AaB 


3 


3 


3 




a 


1 


** 


1 


subtotal 




5 


3 


0 


VII 


alp3 


1 




1 


VIII 


alp3 


1 




1 




none 


1 




1 


subtotal 




2 


m - 


2 


Total 




224 


f 24 4f 








(55) 


fgg 



10(4) 



Note. 

A: 5'-end of bca gene (C alpha protein); 

a: tea gene repetitive unit or bca gene repetitive unit-like sequence (mult.ple band 

amplicon); . 

as: oca gene repetitive unit or oca gene repetitive unit-like sequence (single band 

amplicon); 

B: C beta/lgA binding protein (bac) gene. 
R: Rib protein {rib) gene; 
alp2: C alpha-like protein 2 (a/p2) gene; 
alp3: C alpha-like protein 3 (alp3) gene; 
n assumed Rib-like protein gene. 
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Table 13. Distribution of mobile genetic elements among 194 invasive 
GBS Isolates. 



Mobile genetic elements present 



local in 23 




loo (si 


loi JtO 








o 




— 


— 


— 


— 




to 


/o 


— 


— 


— 




— 


Z 






— 


— 


9 


mm 


J / 


J 1 


j 1 




— 







1 
1 


1 
1 


— 


1 
X 







— 




j 












29 


29 


29 


29 








6 


6 


6 




6 






8 


8 


8 






8 




18 




18 






18 




1 


1 








1 




1 


1 




1 




1 




2 


2 


2 


2 




2 




2 


2 






2 


2 




Total 


168 (87%) 


100 (52%) 


33 (17%) 


11(6%) 


34 (18%) 


6(3%) 


(n=194) 















Note. 

Data are numbers of isolates containing various combinations of mge 
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Serotype 
Genotype 



Table 14 Relationship between GBS genotypes and invasive disease age. 

Age-group/disease 1 





0-6d 


7-3m 


4m -14yr 


Ia-1 


14 


4+1 


1 




4 


2 




la total 


18 (34%) 


6+1 (21%) 


1(10%) 


tv-± 


2 


1+1 






3 


4+2 


m 


Th total 


5 (9,4%) 


5+3 (24%) 


_ 


n 


8 (15%) 


1(3%) 




TTT-1 
111*1 


6+1 (13%) 


4(12%) 


1+1 (20%) 


m-2 


5 (9%) 


5+4 f39%i 5 


1 (10%) 


m-(3-4) 


1+1 


1 




m total 


12+2 {16%) 


10+4 (41%) 


2+1 (30%) 


IV total 


3 






V-l 


3 


3 


2 


V-(2-7) 


1 


1 




V total 


4 (8%) 


4 (12%) 


2 (20%) 


VI total 


1 






TOTAL 


51+2=53 


26+8=34 


5+2=7 



4m-14yr 15-45 yr 46-60 yr >60yr 



7 
1 

8 (28%) 

3 

3 

6(21%) 

4+1(17%) 

1+1 (7%) 

2 

1 



3 (18%) 

2 

1 

3 

1 

6+1 (41%) 



6 
3 

9 (17%) 

5+1 

5 

10+1 
4 (7%) 
4 



4 
1 

5 (17%) 
27+1=29 



1 1 

7+1 (44%) 5 (9%) 
4 

2 13+1 
4 



Total 

35+1 (19%) 
10 

45+1(24%) 

13+2 

16+2 

29+4 (17%) 
18+1 (10%) 
22+4(13%) 
13+4(9%) 
5+1 

40+9 (25%) 
7(4%) 
27+1 (14%) 
7 



2 (11%) 
+1 

16+2=18 



17+1 (33%) 4 34+1 (18%) 
3 4+1 (3%) 

52+2=54 177+17=194 



Notes: , . . . 

1 . Numbers after "+" refer to CSF isolates; all others are from blood. 

2. Five aged 4m-1yr and one case was aged 3 yr. 

3. Sst III-2 in late onset infection compared with all other groups: p=0.0005, odds 
ratio (OR) 6.8; 95% confidence interval (CI) 2.4-19.4. 

M8W,«a* spared with all othar aga^roups: p=0.001 , OR 0.28; 96% CI 

0.13-0.59). 
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CLAIMS 

1. A method of typing a group B streptococcal bacterium which method 
comprises analysing the nucleotide sequence of one or more regions within the 
cpsD, cpsE, cpsF, cpsG and/or cpsl/M genes of said bacterium, said region(s) 
comprising one or more nucleotides whose sequence varies between types. 

2. A method according to claim 1 wherein the nucleotide sequence is analysed 
for one or more positions corresponding to positions 62, 78-86, 138, 139, 144, 198, 
204, 211, 281, 240, 249, 300, 321, 419, 429, 437, 457, 466, 486, 602, 606, 627, 
636, 645, 803, 971, 1026, 1044, 1173, 1194, 1251, 1278, 1413, 1495, 1500, 1501, 
1512, 1518, 1527, 1595, 1611, 1620, 1627, 1629, 1655, 1832, 1856, 1866, 1871, 
1892, 1971, 2026, 2088, 2134, 2187 and 2196 as shown in Figure 1. 

3. A method according to claim 1 wherein at least one region is within a 
sequence delineated by the 3' 136 bases of the cpsE gene and the 5' 218 bases of 
the cpsG gene of the cpsE-cpsF-cspG gene cluster of said streptococcal 
bacterium. 

4. A method according to claim 3 wherein the nucleotide sequence is analysed 
for one or more positions corresponding to positions 1413, 1495, 1500, 1501, 
1512, 1518, 1527, 1595, 1611, 1620, 1627, 1629, 1655, 1832, 1856, 1866, 1871, 
1892, 1971, 2026, 2088, 2134, 2187 and 2196 as shown in Figure 1. 

5. A method according to any one of claims 1 to 4 wherein at least one region 
is within the cpsl/M genes of said bacterium. 

6. A method according to any one of claims 1 to 5 wherein the nucleotide 
sequence analysis step comprises sequencing said one or more regions. 

7. A method according to any one of claims 1 to 5 wherein the nucleotide 
sequence analysis step comprises determining whether a polynucleotide obtained 
from said bacterium selectively hybridises to a polynucleotide probe comprising 
one or more of the said regions. 

8. A method according to claim 7 which comprises determining whether the 
polynucleotide obtained from said bacterium hybridises to one or more of a 
plurality of polynucleotide probes corresponding to one or more of the said regions. 
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9. A method according to claim 9 wherein the plurality of polynucleotide probes 
are present as a microarray. 

10. A method according to any one of claims 1 to 5 wherein the nucleotide 
sequence analysis step comprises an amplification step using one or more 
primers, at least one of which hybridises specifically to a sequence which differs 
between types. 

11. A method according to any one of claims 1 to 6 wherein the nucleotide 
sequence analysis step comprises an amplification step using primer pairs, at least 
one of which hybridise specifically to a sequence which differs between types. 

12. A method according to claim 10 or claim 11 wherein said primers are 
selected from the primers shown in Table 2. 

13 A method of typing a group B streptococcal bacterium which method 
comprises determining the presence or absence in the genome of said bactenum 
of one or more surface protein genes selected from rib, a/p2 or alp3 genes. 

14 A method according to claim 13 wherein determining the presence or 
absence of said surface protein genes comprises determining whether a 
polynucleotide obtained from said bacterium selectively hybridises to a 
polynucleotide probe corresponding to a region of said surface protein genes. 

15 A method according to any one of claim 13 wherein determining the 
presence or absence of said surface protein genes comprises an amplification step 
using one or more primers which amplify specifically a region of said surface 
protein genes. 

16. A method according to claim 15 wherein said primers are selected from the 
primers shown in Table 6. 

17 A method according to any one of claims 1 to 12 which further comprises 
determining the presence or absence of in the genome of said bacterium of one or 
more surface protein genes selected from rib, alp2 or a/p3 genes. 
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18. A method of typing a group B streptococcal bacterium which method 
comprises determining the presence or absence in the genome of said bacterium 
of one or more mobile genetic elements selected from \$861 t \S1548 t \S1381, 
ISSa4andGBSi1. 

19. A method according to claim 18 wherein determining the presence or 
absence of said mobile genetic elements comprises determining whether a 
polynucleotide obtained from said bacterium selectively hybridises to a 
polynucleotide probe corresponding to a region of said mobile genetic elements. 

20. A method according to any one of claim 18 wherein determining the 
presence or absence of said mobile genetic elements comprises an amplification 
step using one or more primers which amplify specifically a region of said mobile 
genetic elements. 

21. A method according to claim 20 wherein said primers are selected from the 
primers shown in Table 10. 

22. A method according to any one of claims 13 to 17 which further comprises 
determining the presence or absence in the genome of said bacterium of one or 
more mobile genetic elements selected from \S861, IS 7548, , \S1381, ISSa4 and 
GBSil. 

23. A polynucleotide consisting essentially of at least 10 contiguous nucleotides 
corresponding to a region within a cpsD-cpsE-cpsF-cpsG gene of a group B 
streptococcal bacterium, said polynucleotide comprising one or more nucleotides 
which differ between group B streptococcal serotypes. 

24. A polynucleotide according to claim 23 wherein said nucleotides which differ 
between group B streptococcal serotypes correspond to one or more of positions 
62, 78-86, 138, 139, 144, 198, 204, 211, 281, 240, 249, 300, 321, 419, 429, 437, 
457, 466, 486, 602, 606, 627, 636, 645, 803, 971, 1026, 1044, 1173, 1194, 1251, 
1278, 1413, 1495, 1500, 1501, 1512, 1518, 1527, 1595, 1611, 1620, 1627, 1629, 
1655, 1832, 1856, 1866, 1871, 1892, 1971, 2026, 2088, 2134, 2187 and 2196 as 
shown in Figure 1. 

25. A polynucleotide consisting essentially of at least 10 contiguous nucleotides 
corresponding to a region within a sequence delineated by the 3' 136 base pairs of 
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cpsE and the 5' 218 base pairs of cpsG of the cpsE-cpsF-cspG gene cluster of a 
group B streptococcal bacterium, said polynucleotide comprising one or more 
nucleotides which differ between group B streptococcal types. 

26. A polynucleotide according to claim 25 wherein said nucleotides which differ 
between group B streptococcal types correspond to one or more of positions 1413, 
1495, 1500, 1501, 1512, 1518, 1527, 1595, 1611, 1620, 1627, 1629, 1655, 1832, 
1856', 1866, 1871, 1892, 1971, 2026, 2088, 2134, 2187 and 2196 as shown in 
Figure 1. 

27. A polynucleotide consisting essentially of at least 1 0 contiguous nucleotides 
corresponding to a region within a cpsl/M gene of a group B streptococcal 
bacterium, said polynucleotide comprising one or more nucleotides which differ 
between streptococcal serotypes. 

28. A polynucleotide according to claim 27 wherein the polynucleotide is 
selected from the nucleotide sequences shown in Table 2. 

29. A polynucleotide consisting essentially of at least 10 contiguous nucleotides 
corresponding to a region within a rib, alp2 or alp3 gene of a group B streptococcal 
bacterium, said polynucleotide comprising one or more nucleotides which differ 
between group B streptococcal subtypes. 

30. A polynucleotide according to claim 29 wherein the polynucleotide is 
selected from the nucleotide sequences shown in Table 6. 

31 . Use of a polynucleotide according to any one of claims 23 to 30 in a method 
of serotyping and/or subtyping a group B streptococcal bacterium. 

32. A composition comprising a plurality of polynucleotides according to any 
one of claims 23 to 30. 

33. Use of a composition according to claim 32 in a method of serotyping and/or 
subtyping a group B streptococcal bacterium. 

34. A microarray comprising a plurality of polynucleotides according to any one 
of claims 23 to 30. 
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35. Use of a microarray according to claim 34 in a method of serotyping and/or 
subtyping a group B streptococcal bacterium. 
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Figure 1. Multiple sequence alignments of the regions of the 3' end of cpsD-cpsE-cpsF-and 
the 5' end of cpsG for reference strains of serotypes la to VIL 



50 



Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype^Ia-2 — ^TGGTTCA AAGTTCTTAG GTATTATTCT 

51 



C P sDS 100 



Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 <3~ 

Serotype VII 9~ 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V 



•••••• 



Serosubtype^Ia-2 ftATGAATCTG TTGCTACTTA CGGCGATTAC GGCGATTATG 

150 



101 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 



Serosubtype^a-2 — — — AGAAAAAGGA AG ^ GGGGC TCTTGTATTG 

CpsD | 

200 

151 _ 

Serosubtype III-2 ~ ~~" 

Serotype VI " ~ c — 

Serotype lb ™ 

Serotype II/III-4 ~ 

Serotype VII " "~" 

Serosubtype III-3 "~~ "I" 

Serosubtype Ia-1 " ™ 

Serosubtype III-l " "~ c — 
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Serotype V 
Serosubtype Ia-2 
Consensus 

Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 



2/25 

. ■ M— 

AAAGAAAAAG AAAATATACA AAAGATTATT ATAGCGATGA TTCAAACAGT 

I cpsE 

201 250 

a . — 

. g — — ;-t — ; ! c 

— ; — 



-c- 



TGTGGTTTAT TTTTCTGCAA GTTTGACATT AACATTAATT ACTCCCAATT 

251 300- 
1 



TTAAAAGCAA TAAAGATTTA TTGTTTGTTC TATTGATACA TTATATTGTC 
301 350 



TTTTATCTTT CTGATTTTTA CAGAG ACTTT TGGAGTCGTG GCTATCTTGA 

cpsES 

351 400 
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Serotype IV 

Serotype V 

Serosubtype Ia-2 " 71177 

Consensus AGAGTTTAAA ATGGTATTGA AATACAGCTT TTACTATATT TTCATATCAA 



401 



450 



Serosubtype III-2 

Serotype VI 

Serotype lb c 

Serotype II/III-4 a 

Serotype VII a t% " 

Serosubtype III-3 

Serosubtype Ia-1 a " 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 
Consensus 



GTTCATTATT TTTTATTTTT AAAAACTCTT TT ACAACGAC ACGACTTTCC 

cpsEAl 
500 



451 



Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 c 

Serotype VII c 

Serosubtype III-3 

Serosubtype Ia-1 1 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 
Consensus 



TTTTTTACTT TTATTGCTAT GAATTCGATT TTATTATATC TATTGAATTC 



501 



550 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 



ATTTTTAAAA TATTATCGAA AATATTCTTA CGCTAAGTTT TCACGAGATA 



551 



600 
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Serotype V 

Serosubtype Ia-2 

Consensus CCAAAGTTGT TTTGATAACG AATAAGGATT CTTTATCAAA AATGACCTTT 

601 650 

Serosubtype III-2 — 

Serotype VI c 

Serotype lb — — 1 

Serotype II/III-4 -a — — 

Serotype VII -a 

Serosubtype III-3 ; 

Serosubtype Ia-1 1™ c 

Serosubtype III-l c 

Serotype IV 1 

Serotype V : 1 — ■ 

Serosubtype Ia-2 — 1 c 

Consensus AGGAATAAAT ACGACCATAA TTATATCGCT GTCTGTAT CT TGGACTCCTC 

651 700 

Serosubtype III-2 

Serotype VI — 

Serotype lb 

Serotype II/III-4 t 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 — > 

Serosubtype III-l 

Serotype IV -- 

Serotype V 

Serosubtype Ia-2 ■ 

Consensus TGAAAAGGAT TG TTATGATT TGAAACATAA CTCGTTAAGG ATAATAAACA 
cpsESl 

701 750 

Serosubtype III-2 — 

Serotype VI ■ ■ 

Serotype. Ib ■ 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 — ■ 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV 

Serotype V : ■ 

Serosubtype Ia-2 K 

Consensus AAGATGCTCT TACTTCAGAG TTA ACCTGCT TAACTGTTGA TCAAGCTTTT 

cpsEA2 

751 ~ 800 

Serosubtype III-2 

Serotype VI — 

Serotype Ib 

Serotype II/III-4 ; ; 

Serotype VII 

Serosubtype III-3 . 

Serosubtype Ia-1 

Serosubtype III-l — — ; 

Serotype IV 

Serotype V — ; ■ • 
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Sero subtype Ia-2 

Consensus ATTAACATAC CCATTGAATT ATTTGGTAAA TACCAAATAC AAGATATTAT 

801 850 

Serosubtype III-2 ~ 

Serotype VI — t 

Serotype lb 

Serotype II/III-4 ~ ~Z~ 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 ~~ 

Serosubtype III-l " 

Serotype IV " """"""-"I 

Serotype V "_I 

Serosubtype la— 2 

Consensus TAATGACATT GAAGCAATGG GAGTGATTGT CAATGTTAAT GTAGAGGCAC 

851 I™ 

Serosubtype III-2 " " ~_ ~ 

Serotype VI ~ _~~ 

Serotype lb 

Serotype II/III-4 " ~ ~~~~~ 

Serotype VII " ~_ ~ _~ 

Serosubtype III-3 " ~ ~ ~ ~ZIII~~ 

Serosubtype Ia-1 "II"" 

Serosubtype III-l ~ 

Serotype IV ~ _"I 

Serotype V ~~ 

Serosubtype^Ia-2 ttagctttqa taatatagga GAAAAGCGAA TCCAAACTTT TGAAGGATAT 

901 HI 

Serosubtype III-2 ~~ ~ _~~ ~ 

Serotype VI ™ ~~ 

Serotype lb '~ 

Serotype II/III-4 ~ "III 

Serotype VII " ~ _~ III II" 

Serosubtype III-3 *~ III 

Serosubtype Ia-1 " 

Serosubtype III-l IIII II 

Serotype IV " II II 

Serotype V ™ 2 w 

Serosubtype^Ia-2 ™~ TTCTAT GAAATTCTAT AAATATAGTC ACCTTATAGC 



951 1000 



Serosubtype III-2 

Serotype VI t ~ 

Serotype lb 

Serotype II/III-4 ~ t_ 

Serotype VII t_ 

Serosubtype III-3 

Serosubtype Ia-1 " 

Serosubtype III-l 
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Serotype IV < \ 

Serotype V . ' 

Serosubtype Ia-2 ; 

Consensus AAAACGATTT TTGGATATCA CGGGTGCTAT TATAGGTTTG CTCATATGTG 

1001 1050 

Serosubtype III-2 , 

Serotype VI c 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 a ■ 

•Serosubtype Ia-1 ■ a 

Serosubtype III-l ■ 

Serotype IV « — a 

Serotype V a 

Serosubtype Ia-2 — a 

Consensus GCATTGTGGC AATTTTTCTA GTTCCGCAAA TCAGAAA AGA TGGTGGACCG 

1051 1100 

Serosubtype III-2 

Serotype VI 

Serotype lb . 

Serotype II/III-4 

Serotype VII 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV ■ 

Serotype V 

Serosubtype Ia-2 — 

Consensus GCTATCTTTT CTG AAAATAG AGTAGGTCGT AATGGTAGGA TTTTTAGATT 
cpsES2 

1101 1150 

Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 : — 

Serosubtype Ia-1 

Serosubtype III-l — 

Serotype IV ; >. — 

Serotype V — 

Serosubtype Ia-2 

Consensus CTATAAATTC AGATCAAT GC GAGTAGATGC AGAACAAATT AAG AAAGATT 

cpsEA3 

1151 1200 

Serosubtype III-2 — — a 

Serotype VI -■ « — a 

Serotype lb — g 

Serotype II/III-4 

Serotype VII — •« 

Serosubtype III-3 a 

Serosubtype Ia-1 — r 

Serosubtype III-l a 

Serotype IV a 
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Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VTI 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



TATTAGTTCA CAATCAAATG ACAGGGCTAA TGTTTAAGTT AGACGATGAT 



1201 



1250 



CCTAGAATTA CTAAAATAGG AAAATTTATT CGAAAAACAA GCATAGATGA 



1251 



1300 



GTTGCCTCAA TTCTATAATG TTTTAAAAGG TGATATGAGT TTAGTAGGAA 



1301 _ ^ 

Serosubtype III-2 II" 

Serotype VI _~_I_i 

Serotype lb I I _ _ 

Serotype II/III-4 IIIIIIIIII II-II 

Serotype VII ; ~ "™ 

Serosubtype III-3 I~„I II-I - 

Serosubtype Ia-1 *" 

Serosubtype III-l " ~" 

Serotype IV ~ "~ ~_2II_III- , 

Serotype V ~ ~ " ~~ 

Serosubtype^Ia-2 GAATATGAAA ACTMAATTC AACGCAGAAG 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
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Serotype IV 

Serotype V 

Serosubtype Ia-2 — 

Consensus CGACGCCTTA GTTTTAAGCC AGGAATCACT GGTTTGTGGC AAATATCTGG 

1401 1450 

Serosubtype III-2 

Serotype VI • — 

Serotype lb — : . — — „ 

Serotype II/III-4 . 

Serotype VII 

Serosubtype III-3 — c~ 

Serosubtype Ia-1 c — : ~ 

Serosubtype III-l . 

Serotype IV • 

Serotype V : - 

Serosubtype Ia-2 — ; c 

Consensus TAGAAATAAT ATTACTGATT TTGATGAAAT CGTAA AGTTA GATGTTCAAT 

14 51 1500 

Serosubtype III-2 . — 

Serotype VI — a 

Serotype lb - : ~ g 

Serotype II/III-4 

Serotype VII — 

Serosubtype III-3 

Serosubtype Ia-1 — 

Serosubtype III-l — 

Serotype IV 

Serotype V — 

Serosubtype Ia-2 « 

Consensus ATATCAATGA ATGGTCTATT TGGTCAGA TA TTAAGATTAT TCTCCTAACA 
cpsES3 

1501 1550 

Serosubtype III-2 -t — c — 

Serotype VI -t c — 

Serotype lb -t c — 

Serotype II/III-4 t 

Serotype VII t 

Serosubtype III-3 1 • 

Serosubtype Ia-1 — : 1 

Serosubtype III-l -t c — 

Serotype IV > 1 

Serotype V -t —c — 

Serosubtype Ia-2 1 

Consensus CTAAAGGTAG TCTTACTTGG GACAGG AGCT AAGTAAAGGT AAGGTTTGAA 

cpsE | cpsEFA 
1551 1600 

Serosubtype III-2 

Serotype VI — c 

Serotype lb — : : ■ c 

Serotype II/III-4 : 

Serotype VII 

Serosubtype III-3 

Serosubtype Ia-1 

Serosubtype III-l 

Serotype IV r 
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Serotype V ~ ~ 

Serosubtype Ia-2 

Consensus AGGAATATAA TGAAAATTTG TCTGGTTGGT TCAAGTGGTG GTCATCTAGC 

1 ^P sF 

1601 „J^Ht 

Serosubtype III-2 ~ _ 

Serotype VI 

Serotype lb _"~ 

Serotype II/III-4 ~™ 

Serotype VII t 1 " ™"~Z- 

Serosubtype III-3 

Serosubtype Ia-1 " ~ ~~ 

Serosubtype III-l 

Serotype IV t ~ 

Serotype V " _~ 

Ser05Ub cSsensus ACACTTGAA.C CTTTTGAAAC CCATTTGGGA AAAAGAAGAT AGGTTTTGGG 



1651 „_™ 

Serosubtype III-2 " ZZ—ZZZZZZZ 

Serotype VI ' 

Serotype lb 1 ~~ ZZZ 

Serotype II/III-4 ~" ZZZZZZZ— ZZ 

Serotype VII ™~~ ZZZZZZ^Z 

Serosubtype III-3 ~I_ 

Serosubtype Ia-1 ~ 

Serosubtype III-l ZZZ Z ZZZ ZZZZ 

Serotype IV ~ ~_ ~ 

Serotype V " 

Serosubtype^Ia-2 ^^AAGAT GCTAGGAGTA TTCTAAGAGA AGAGATTGTA 

1701 _ ™ 

Serosubtype III-2 ZZ_ ZZZZ I 

Serotype VI 

Serotype lb ~ 

Serotype II/III-4 "I™ ZZZZZZ 

Serotype VII " "ZZZZ^ZZZZ 

Serosubtype III-3 ~ _ ~~ 

Serosubtype Ia-1 

Serosubtype III-l " ~" 

Serotype IV ~ZZ__Z_Z 

Serotype V 

Serosubtype^Ia-2 ^CTTTCCAAC AAACCGTAAT GTCAAAAACT TGGTAAAAAA 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 



1751 1800 
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Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



TACTATTCTA GCTTTTAAGG TCCTTAGAAA AGAAAGACCA GATGTTATCA 



1801 



1850 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



TATCATCTGG TGCCGCTGTA GCAGTACCAT TCTTTTATAT TGGTAAGTTA 



1851 



cpsFS 



1900 



TTTGGTTGTA AGACCGTTTA TATAGAGGTT TTCGACAGGA TAGATAAACC 



1901 



cpsFA 



1950 



AACTTTGACA GGAAAATTAG TGTATCCTGT AACAGATAAA TTTATTGTTC 



1951 



2000 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
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Serotype IV 

Serotype V 



Serosubtype^ CTT^CCTA AGGCAATTAA TTTAGGAGGA 



2001 



2050 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
S.erosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



Serosubtype III-2 
Serotype VI 
Serotype lb 
Serotype II/III-4 
Serotype VII 
Serosubtype III-3 
Serosubtype Ia-1 
Serosubtype III-l 
Serotype IV 
Serotype V 
Serosubtype Ia-2 
Consensus 



ATTTTTTAAT GATTTTTGTC ACAGTGGGGA CACATGAACA GCAGTTCAAC 
cpsF | cpsG 



2051 



2100 



CGTCTTATTA AAGAAGTTGA TAGATTAAAA GGGACAGGTG CTATTGATCA 
2101 215 ° 



Igaagtgttc ATTCAAACGG GTTACTCAGA CTTTGAACCT CAGAATTGTC 

cpsGS 
2200 

2151 



Serosubtype III-2 

Serotype VI 

Serotype lb 

Serotype II/IH-4 ~ _~ Z-ZZ~a g- 

Serotype VTI ~~ ~~ 

Serosubtype III-3 

Serosubtype Ia-1 _„„ 

Serosubtype III-l ' 
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Serotype IV 

Serotype V — 

Serosubtype Ia-2 ; — : 

Consensus AGTGGTCA AA ATT TCTCTCA TATGATGATA TGAACTCTTA CATGAAAGAA 

cpsGAl cpsGA2 

2201 2226 

Serosubtype III-2 

Serotype VI 

Serotype lb c 

Serotype II/III-4 

Serotype VII 

Serosubtype III-3 ; 

Serosubtype Ia-1 ■ 

Serosubtype III-l 

Serotype IV 

Serotype V 

Serosubtype Ia-2 

Consensus GCTGAGATTG TTATCACACA TGGCGG 
cpsGA3 

Notes. 

Numbering start point "1" refers to the start point "1" of GenBank accession number AF332908 (for 
serotype IV reference strain 3 139). 

Serosubtype Ia-1 : strain 090, GenBank accession number AF332893; 

Serosubtype Ia-2: strain NZRM 908(NCDC SS615), GenBank accession number AF332894; 

Serotype lb: strain H36B, GenBank accession number AF332903; 

Serotype MII-4: strain 18RS21, GenBank accession number AF332905; 

Serosubtype III-l : strain SG99/056, GenBank accession number AF332899; 

Serosubtype m-2: strain M781, GenBank accession number AF332896; 

Serosubtype m-3: strain NZRM 912 (NCDC SS620), GenBank accession number AF332897; 

m-4 (Subtype DI-4): strain SG96/220, GenBank accession number AF363036; 

Serotype IV: strain 3 139, GenBank accession number AF332908; 

Serotype V: strain CJB 111, GenBank accession number AF332910; 

Serotype VI: strain SS1214, GenBank accession number AF332901; 

Serotype VH: strain 7271, GenBank accession number AF332913. 
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Figure 3. Multiple sequence alignments of the gene sequences of the cpsG-cpsH- 
cpsI/M for serotypes la, lb, n, m, IV, V and VI (start and stop codons were 
highlighted). 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



ATGATTTTTG 
ATGATTTTTG 
ATGATTTTTG 
ATGATTTTTG 
ATGATTTTTG 
ATGATTTTTG 
********** 

cpsG 
51 

TAAAGAAGTT 

TAAAGAAGTT 

TAAAGAAGTT 

TAAAGAAGTT 

TAAAGAAGTT 

TAAAGAAGTT 
********** 



TCACAGTGGG 
TCACAGTGGG 
TCACAGTGGG 
TCACAGTAGG 
TCACAGTGGG 
TCACAGTGGG 
*******_** 



GATAGATTAA 

GATAGATTAA 

GATAGATTAA 

GATAGATTAA 

GATAGATTAA 

GATAGATTAA 
********** 



GACACATGAA 
GACACATGAA 
GACACATGAA 
GACACATGAA 
GACACATGAA 
GACACATGAA 
********** 



AAGGGACAGA 

AAGGGACAGG 

AAGGGACAGG 

AAGGGACAGG 

AAGGGACAGG 

AAGGGACAGG 
*********_ 



CAGCAGTTCA 
CAGCAGTTCA 
CAGCAGTTCA 
CAGCAGTTCA 
CAGCAGTTCA 

CAGCAGTTCA 
********** 



TGCTATTGAT 

TGCTATTGAT 

TGCTATTGAT 

TGCTATTGAT 

TGCTATTGAT 

TGCTATTGAT 
********** 



50 

ACCGTCTTAT 
ACCGTCTTAT 
ACCGTCTTAT 
ACCGTCTTAT 
ACCGTCTTAT 
ACCGTCTTAT 
********** 

100 

CAAGAAGTGT 

CAAGAAGTGT 

CAAGAAGTGT 

CAAGAAGTGT 

CAAGAAGTGT 

CAAGAAGTGT 
********** 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



101 

TCATTCAAAC 
TCATTCAAAC 
TCATTCAAAC 
TCATTCAAAC 
TCATTCAAAC 
TCATTCAAAC 
********** 



GGGTTACTCA 
GGGTTACTCA 
GGGTTACTCA 
GGGTTACTCA 
GGGTTACTCA 
GGGTTACTCA 
********** 



GACTTTGAAC 
GACTTTGAAC 
GACTTTGAAC 
GACTTTGAAC 
GACTTCGAAC 
GACTTTGAAC 
*****_**** 



CTCAGAATTG 
CTCAGAATTG 
CTCAGAATTG 
CTCAGAATTG 
CTCAGAATTG 
CTCAGAATTG 
********** 



150 

TCAGTGGTCA 
TCAGTGGTCA 
TCAGTGGTCA 
TCAGTGGTCA 
TCAGTGGTCA 
TCAGTGGTCA 
********** 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



151 

AAATTTCTCT 

AAATTTCTCT 

AAATTTCTCT 

AAATTTCTCT 

AAATTTCTCT 

AAATTTCTCT 
********** 



CAT AT GAT GA 
CAT AT GAT GA 
CAT AT GAT GA 
CAT AT GAT GA 
CATATGATGA 
CATATGATGA 
********** 



TATGAACTCT 

TATGAACTCT 

TATGAACTCT 

TATGAACTCT 

TATGAACTCT 

TATGAACTCT 
********** 



TACAT GAAAG 

T ACAT GAAAG 

TACAT GAAAG 

TACAT GAAAG 

TACATGAAAG 

TACAT GAAAG 
********** 



200 

AAGCTGAGAT 

AAGCTGAGAT 

AAGCTGAGAT 

AAGCTGAGAT 

AAGCTGAGAT 

AAGCTGAGAT 
********** 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



201 . 

TGTTATCACA 

TGTTATCACA 

TGTTATCACA 

TGTTATCACA 

TGTTATCACA 

TGTTATCACA 
********** 



CATGGCGGTC 
CATGGCGGCC 
CATGGCGGTC 
CACGGCGGTC 
CATGGCGGCC 
CATGGCGGCC 



CAGCGACGTT 
CAGCGACGTT 
CAGCGACGTT 
CAGCAACGTT 
CAGCGACGTT 
CAGCGACGTT 



**_*****-* ****_***** 



TATGAATGCA 
TATGAATGCA 
TATGAATGCA 
TATGAATGCA 
TATGTCAGTT 
TATGTCAGTT 
**** * 



250 

GTTTCTAAAG 

GTTTCTAAAG 

GTTTCTAAAG 

GTTTCTAAAG 

ATTTCTTTAG 

ATTTCTTTAG 
_***** ** 
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Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



251 

GGAAAAAAAC 
GAAAAAAAAC 
GGAAAAAAAC 
GGAAAAAAAC 
GGAAATTACC 
GGAAATTACC 



TATTGTGGTT 
TATTGTGGTT 
TATTGTGGTT 
TATTGTGGTT 
AGTTGTTGTT 
AGTTGTTGTT 



CCTAGACAAG 
CCTAGACAAG 
CCTAGACAAG 
CCTAGACAAG 
CCTAGGAGAA 
CCCAGGAGAA 
**_*+ *- 



AACAGTTTGG 
AACAGTTTGG 
AACAGTTTGG 
AACAGTTTGG 
AGCAGTTTGG 
AGCAGTTTGG 



301 

AATAATCATC 
AATAATCATC 
AATAATCATC 
AATAATCATC 
AATGATCATC 
AATGATCATC 

351 

TGATATCGTT 
TGATATCGTT 
AGATTATATT 
AGATTATATT 
GGCTTGGATT 
GGCTTGGATT 
** 



AGGTGGATTT TGTTAATAAG GTAAAAACAA 
AGGTGGACTT TGTTAATAAG GTAAAAACAA 
AGGTGGATTT TTTGAAAGAG TTATTCTTGA 
AGGTGGATTT TTTGAAAGAG TTATTCTTGA 
AAATACAATT TTTAAAAAAA ATTGCCCACC 
AAATACAATT TTTAAATTCG ATTGCCCACC 
* * -* 



GTAGATATTG 
GTAGATATTG 
TTGAATATCA 
TTGAATATCA 
GAAGATGTAG 
GAAGATGTAG 



AAAGGTTACA 
AAAGGTTACA 
GTGAATTAGA 
GTGAATTAGA 
ATGGACTTGC 
ATGGACTTGC 



AAATGTAGTC 
AAATGTAGTC 
GAATATTATT 
GAATATTATT 
GGAAGCGTT. 
GGAAGCGTT. 
* +- 



300 

AGAGCATGTG 
AGAGCATGTG 
AGAGCATGTG 
AGAGCATGTG 
TGAACATATC 
TGAACATATC 

350 

TGTATAATTT 
TGTATAATTT 
AAATTGAATT 
AATATGAGTT 
TGTATCCCTT 
TGTATCCCTT 
* ** 

400 

TATGAGGGGA 
TATGAGGGAA 
AAGGAAAAAA 
AAGGAAAAAA 
. , GAAAAGGA 
. . GAAAAGGA 
* * 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



401 

CGATGAATCG 
TGATGAATCG 
AT AT AT CT AC 
ATATATCTAC 
ATATAGCTAC 
ATATAGCTAC 



TCCGTTTTTA 
TCCGTTTTTA 
TAGTAAAGTA 
TAGTAAAGTA 
AGAAAAATAT 
AGAAAAATAT 



450 

GAAACTAACA GAAGTAATTT TATT 

GAAACTAATA GTAGTAATTT TATT 

ATATCACAAA ACAATGATTT TTGTTTCTCT 
ATATCACAAA ACAATGATTT TTGTTCCTCT 

CAGGGAAATA ATGATATGTT TTGT 

CAGGGAAATA ATGATATGTT TTGT 



451 



500 



GAAGAA TTTAAGGTAA TATTAAAGGA 

. , .GAAGAA TTTAAGGTAA TATTAAAGGA 
TTCAAAAATG AACATTTCAT AaACTATTTG AATAAATATA TTTTGTTGGA 
TTCAAAAATG AAC. .TTTCT AAACTATTTG AATAAATATA TTTTGTTGGA 
TTCAAAAATG AAC ..ill a^TAGAAAA AATTATAGGT 

CATA AATTAGAAAA AATTATAGGT 

* +_* * — ** **- 



501 

GTTGTGTGAT 
GTTGTGCGAT 
GAAAAAAATT 
GAAAAAAATT 
GAAATATGAG 
GAAATATGAG 



GAAA 

GAAA 

GAAATTAACA 
GAAATTAAC . 
GAAAT, . . - A 
GAAAT. . . .A 
**** 



. ATCAATAAA 
.ATCAATAAA 
TATCAATCCA 
TATCAATCCA 
TCTAGATTTA 
TCTAGATTTA 



AACTCTTTAT 
AACTCTTTAT 
AAGTATTTGT 
AAGTATTTGT 
GATTATTCTT 
GATTATTCTT 



550 

TTTATATTGC 
TTTATATTGC 
TAATAGGAGG 
TAATAGGAGG 
TATTTTATGC 
TATTTTATGC 



+ ** + _+-+_*+ — + * *- 
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Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 
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551 600 
AATATTTTTA GTTAATTTTT TTAAATCACT AGGTTTAGGA GAGGGGAACT 
AATATTTTTA GTTAATTTTT TTAAATCACT GGGTTTAGGC GAGGGAAACT 
AATTTTCGCT TTAACCCTAT TTTCAAAGCC AATGCAACTT TTGTTACTTT 
AATTTTCGCT TTAACCCTAT TTTCAAAGCC AATGCAACTT TTGTTACTTT 
TCTTTGGGTA CTTATTTTAG TACCAAACCA ATGGTATCAG TTTTTAATTA 
TCTTTGGGTA CTTATTTTAG TACCAAACCA ATGGTATCAG TTTTTAATTA 
— *_* * * * , . a 

cpsH 

601 650 
CAACTTACAA AATAGTGATG TTTGTTGCAA TCTTCTTGTG TGGAATAAAA 
CAGCTTACAA AATAGTGATG TTAGTTGCAA TTTTACTGTG TGGAATAAAA 
TAGCATTAAT AGTTTTACTT ATTTGTAGTA GTTATAAGAA AAAAATGAAA 
TAGCATTAAT AGTTTTACTT ATTTGTAGTA GTTATAATGA AAAAATGAAA 
TTACCATTAT AGTTCTATTA TTACTTTGGA AGAGTGAGTT TAGAAT. . .A 
TTACCATTAT AGTTCTATTA TTACTTTGGA AGAGTGAGTT TAGAAT. . .A 



651 

TTTTTA. . . . 
TTTTTA. • , , 
TTTTTATATA 
TTTTTAAATA 
TCTATAAGCA 
TCTATAAGCA 
+_*_++ 



. . TTAGATAG 
. . TTAGATAG 
TGGCTGAAAT 
TGGCTGAAAT 
ATTCTTCAAT 
ATTCTTCAAT 
*- 



CCTTTATTTT 
CCTTTATTTT 
TTTTTTCATT 
TTTTTTCATT 
ACTATTTCTG 
ACTATTTCTG 
*_ 



GAAAGAAGAA 
GAAAGAAGAA 
GTATTTTATA 
GTATTTTATA 
CTTTGGTTAT 
CTTTGGTTAT 



700 

AACTCGTTAT 
AACTCGTGAT 
TCATTTATTT 
TGGTTTATTT 
TTATTTATTT 
TTATTTATTT 
* * 

750 

TTTGTTCATA 

TTTGTTCATA 

TTTGATAGAA 

TTTCATAGAG 

TTTCAGCGAT 

TTTCAGCGAT 
*** 

800 

TTTAATTTTT 
TTTAATTTTT 
AGTGGCTTTG 
CTTATTATTT 
TTTATTTTTT 
TTTATTTTTT 
— * 



701 

CATCTTTTTA TTATTTATTG CGACCATTTT GAATTTATTC 

CATCTTTTTA TTATTTATCG CGACCATTTT GAATTTATTC 

AACTTCAATA TTGCTACATT CTTTGTTTAA AACTCCTGAT 

AGTATCAATA GTATTAAATT CGTTATTTAG AAGTCCAGAA 

ATTTGCAATA CTCATTAGAG GTACTCAAGA GGATATAACG 

ATTTGCAATA CTCATTAGAG GTACTCAAGA GGATATAACG 
_™„„** _* * ^ 

751 

AGGTTACTTT TATATTAA \ C 

AGGTTACTTT TATATTAA. . . . ... C 

TTTTAGCAGC TTTTAACTCG TTGATTATCG GTATAGTATC 

TCATTGCTGC ATTCAATTCA CTGGCAGTAG GGGTTGTGTC 

TTATTGCTGA GCTATTAAAA CTAATTAGTA CAGGATATGC 

TTATTGCTGA GCTATTAAAA CTAATTAGTA CAGGATATGC 
* — + — * . * 



801 850 
TTTCTAGCAT TAAAGGATAT CTCTCTAAAA AAAGCTTTCT CTATAATAAT 
TTTCTAGCAT TAAAGGACAT CTCTCTAAAA AAAGCTTTCT CTATAATAAT 
AAACGGTGGT ATAAGAATAC AACTTTGGAG TTAGATAAAA TATTAAAAGC 
TACCATTACT ATAAGAATAC TAATATTGAA TTAACAAAAT TGCTAAAATC 
TATAATTATT ATAGAAAAGC TGATTTTAAT AGTTCAGTTG TAAGGAATGT 

TATAATTATT ATAGAAAAGC TGATTTTAAT AGTTCAGTTG TAAGGAATGT 
•*• * + *_* + 
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Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



851 

AGGATCGCGT 
AGGATCGCGT 
ATTTTTATTT 
ATTTTTGTTT 
GGTAAAGGTT 
GGTAAAGGTT 

901 

ATTTAATAGA 
ATTTAATAGA 
ATTGTTTGCA 
ATGCCATATA 

TATT 

TATT 



ATTTTGGGAG 
ATTTTGGGAG 
AATGGGTTAA 
AATGCAATTA 
AACTATTTTG 
AACTATTTTG 



TTCTATTAAA 
TTCTATTAAA 
TCCTATTTTT 
TTTTGTTTTG 
TGTTGTTTCT 
TGTTGTTTCT 
* 



TCAAATTTTT 
TCAAATTTTT 
TTTAGGGGGA 
TTTAGGATTT 
TATAACAGTT 
TATAACAGTT 



AATTAAATAT 
AATTAAGTAT 
TAATAATATT 
TTTTGATGTA 
TTTTCCTATG 
TTTTCCAAAT 



ATCAATTTTT 
GTCAATTTTT 
CAAAATATCA 
GAGAATGTAA 
CTGAAGCCAA 
GAATTTACTA 



ATAGGGATGG 
ATAGGGATGG 
GTATTTTTGG 
GTCTTTTTGG 
CTTTATTTGG 
CATTCCTAGG 



900 

GTGAAATTAG 
GTGAAATTAG 
ACATATTATT 
CTATATTATT 
TTATATT • . . 
TTATATT . . . 

950 

ACAATTTATT 
ACAATTTATT 
TAGAGATTTG 
AAGAAATTTA 
AAGAGAATTG 
AAGAGATTTA 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



951 

CTGAGAAGTG 
CTGAGAAGTG 
ATTGGGTCAG 
ATTGGATCAG 
TTTTCAATAG 
TTTTCAATTG 



ACT 
ACT 



1000 
.TAGG 
.TAGG 



ACTGGATTAA TGGTATGCAT ACTCAAAGAG CAATGGGATT 
AT TG GAT AAA TGGGATGCAT ACGCAGAGAG CAATGGCTTT 
AGTGGTTTCC ACATATG. . . AGAATAAGAC TTGCGGCATA 
AATGGATTCC TTCTATG . . . AAAGTTAGAC TTACTGCATA 



1001 

TTTTGGTCAT 
TTTTGGTCAT 
TTTTGAATAT 
CTTTGAATAT 
TTTTGAATAT 
TTTTGAGTAT 



CCTAACTTTA 
CCTAACTTTA 
TCAAACCTTA 
TCAAATCTTA 
GCTACACTAA 
GCAACACTAT 



TTCATAATTT 
TTCATAATTT 
TAATTCCTAT 
TAATACCCTT 
TTGGTCAGTT 
TAGGTCAGTT 



TTTTGCAGTA 
TTTTGCTCTA 
GACAGTGGTA 
AACTATCATA 
TATTTTATTT 
TATTTTATTC 



+★ 1 

TATATGTAAC ACTTTTTTAT AGAAAACTAA GAT . TAATAA 
TGTATATTGT ACTCAATTAT AAACGACTAA AGC.CTGTTG 
TATATATATA T.TATATGAA GTTAAGAAAC TATTCAATTA 
TATATATATA TATATATTAA GCAAAGATAT AGCTCAGGGA 
TAC TTTTTTTGAA ACCCCAAAAA CATATGGAAA 

TAT*. ! ! 1 ! TATTTTTAAA ACAGCAGAGG TATGGAGAAA 



1050 
ACTGTTTTTT 
ACTATTTTCT 
ACTAACTATA 
ACTAA.TATA 
TCTTATCCCA 
ACTTATCCGA 


1100 
CTATTGCTTT 
TGATGGTTTT 
TGACCATAGG 
TGATGATACT 
ATATTTTAAT 
ATATTTTTAT 



1101 

TATTTTAACT 
ATTTTTAACA 
TGTTGTATTA 
CGGTGCTCTT 
ATCCTTACTG 
CACACTATTC 



CTAAATTACT 
TTAAATTATT 
TTATTTACCT 
CTCTCCACTA 
TTGACTATAT 
CTAGTTTTTT 



TCTTGTATCA 
TATTGTACCA 
TTATTTTACC 
TTATACTACC 
GTTCATACTT 
GTGCATATTT 



GTATACTTAT 
ATATACTTTT 
TATTGGATCG 
CATCGGGTCT 
TTCTGGCGCT 
GACAGGGGCA 



1150 
TCAAGAACTG 
TCAAGGACAG 
GGCTCCAGGG 
GGATCTAGAG 
AGAATACTAT 
AGAATTTTCC 
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Serotype V 
Serotype la 
Serotype lb 
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Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1151 

GATATTATAT 
GGTATTATAT 
CTGGAATAGT 
CTGGTATTAT 
TGGTCTGTAT 
TAATTTGTAT 



AGTACTCTTA 
CGTAATTTTA 
AGCTATATTG 
AGTTGTGCTA 
GTTGGTTTTA 
GATAATTTTA 



TTTATACTTA 
TTTATTGTAC 
GCGCAGATGT 
CTACAGGTTA 
TTAGCATCGC 
TTAGGTTATT 



TTATATATGT 
TCATTTATGT 
TTATTCTTCT 
TAATTTTATT 
TTCTTTTAGA 
TACTCTTAGA 



1200 
TACAAAGAAT 
GACAAAGAAT 
TCTAAATACA 
GTTGAATACA 
TTATATCCTT 
AATAATCATT 



1201 

AACCTGATAA 
AGCTTAATAA 
GTTGTCGTAA 
ATTGTAATAA 
TTTAAAACTA 
AATAAATTTA 



GGAAAATTTT 
AAAGAGTAFT 
AGAAGAAAAC 
AAAGACAAAC 
ATTTGAAATT 
ACCTAAAAAT 



TATGATAGTT 
TATGAAATTA 
TATAAAATTT 
GATAAGATTT 
GACCAAGAAA 
TACTAAAAAA 



GCTCCGTACA 
GCACCCTATG 
TTATTGTACA 
TTCCTGTATT 
AACACTTTTA 
GCTGTCTTTT 



1250 
TACAACTGTT 
TACAATTTTT 
TACTTCCGTT 
TAGTTCCGAT 
TACTTGGTAT 
TGATAATTAT 



125 1 1300 

Serotype IV CTTGTTAGCA TTTACTTTTC TTTGCTCTAC TATTTTTTTC AACTCAAATT 

Serotype V TTTATTAGTA TTTACCTTTT TGAGTTCTAC AATTTTTTTT AATTCAAATT 

Serotype la TCTACTAGTA ATAGTAATGA TGTTATATTT TGATAACTTA CTATCTATAT 

Serotype lb ACTAATATTA CTATTAGTGA TATTACGTTT TGATAATTTG GTGAGCATAT 

Serotype III GACTTTCTTA TTTATCACCG CTTGTTTTTC TTATAACATA TGGTCAATAA 

Serotype VI AGGGATAATA TTATTATTGG TATGTTTTTC TTACAAAGTG GAGTCTATTA 

Consensus * * -* * — *_ ★ 

1301 1350 

Serotype IV TTGTTCAAAA ATTAGATAGC CTTTTGACAG GTAG 

Serotype V TTGTTCAAAA ATTAGATGTT CTTTTAACAG GTAG 

Serotype la ATTATCGTAT AATTAATTTG CGATCCGGGA GTAGTGAATC CAGATTTTCT 

Serotype lb ATAATAGAAT AATCAATTTG CGGTC.GGGAA GTAGTGAATC TAGATTTTCT 

Serotype III TTGAAAAAAT AATTATGTAC AGAAACCAAA GTACTATCAC TAGGATGATA 

Serotype VI TCAATTATAT AATACACTAT AGATTTCAAA GTAGTAGTAC AAGATTGACA 

Consensus *- *-* *** r 

1351 1400 

Serotype IV . . GTTAAACT ATGCTCATTT ACAGCTTGTA GACGGCTTAA CTCTTTTTGG 

Serotype V . .ATTACACT ATGCTCATTT ACAACTTGTA GATGGTTTAA CTCCTTTTGG 

Serotype la GTATATAAAG ATACAGTAAA CATCGTTATA AATAATTCTT TATTATTTGG 

Serotype lb TTGTACAAGG ATACCGTACA CTCAGTAATT ACTGACTCAC TATTTCTGGG 

Serotype III GTTTATCAAG AAAGTATTAT TGAAGTTCTA AAAGGAAATA TTTTATTTGG 

Serotype VI GTCTATTACG AAAGTATAAG AGCGATTTTA GATGGGAATT TCCTTATTGG 

Consensus * — -* — * — * — *- *_*+ 

1401 1450 

Serotype IV AAATAGTTTT AAGGAG A CGAGTGTCCT 

Serotype V AAATAGTTTT AAGGAA A CAAGTGTCCT 

Serotype la AGAAGGAGTT AAAGAGTTAT GGTTAAATAG TGATCTACCT TTGGGGTCGC 

Serotype lb AAAAGGTGTA AAAGAATTGT GGTTAAATAG TGATTTACCA CTAGGATCGC 

Serotype III ACAGGGTATA AGGA. . .TTC CATCAAGTGA AGGAATATTC CTAGGATCGC 

Serotype VI GCAAGGTATA AGAG . . . TTC CCTCCAGTGT GGGAATATTT TTAGGTTCAC 

Consensus — + — * — *- * + — +* — 
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Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1451 

ATTTGATAAT 
ATTTGATAAT 
ATTCAACGTA 
ATTCGACCTA 
ATTCTACTTA 
ATTCATCATA 
*** 



AGCTACTCTA 
AGCTACTCTA 
TATAGGCTAT 
CATAGGTTAT 
TATTAGTGXC 
CATTAGTATA 



TGTTATTGAG 
TGTTATTGAG 
TTCTACAAAA 
TTCTATAAAA 
TTTTACAGGA 
TTTTATAGAA 



TATGTATGGT 
TATGTATGGT 
GTGGCCTGCT 
CTGGCCTATT 
CTTCTTTATT 
CTTCTTTTAC 



1500 
GTAGTACTTA 
GTAGTACTTA 
GGGATTAATG 
TGGACTAATA 
AGGAATTGTT 
GGGGCTGTTT 




1501 1550 

Serotype IV CCATGTTTTG TATGATAATC TATTATATCT ATAGTAAAAA AGTCAATGTA 

Serotype V CCATGTTTTG TATGATAATC TATTATATCT ATAGTAAAAA GATAATCATA 

Serotype la AATATAGTTC CAGGTTTGCT TTTAAT . TTT TACTAATATT GGTAGGAAAG 

Serotype lb AATGTGATTT TAGGTTTGTT TCTAAT . TCT TATTAGCATT ATCAAGGAAG 

Serotype III CTTTATTTTT CTGCCTTTAT ACTTTTATAT AAAGAAGCGA TTTCAAAAAA 

Serotype VI CTTTTCTTTT CAATATTACT TTTTCTATAT AGAGAAGCTA TCAAACAAAA 

Consensus **- * — * 

1551 1600 

Serotype IV GTTGAGCTCC AGATACTTTT GTTTA TA 

Serotype V ATTGAACTTC AACTACTCCT ATTTA TA 

Serotype la CTAAACAATC AGCTTTTTAT TATGAGATAG TAGGAACACT TATAACTTTA 

Serotype lb CTAAAAAGTC AGATTTCTAT TATGAGATAG TAGGGTCTGT CATACTCCTA 

Serotype III TTATAAAATC TACAGATTAT TTT T TTATACGTTA 

Serotype VI CAGGATAATC TACAAGCTTT TTT T TGGATTGTTA 

Consensus * * * — * 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1601 

ATGTCTATAG 
ATGTCTATAA 
TTCTCATTTT 
TTTTCATTTT 
TTATGTTACA 
TTATTGTATA 



TATTATTTAC 
TATTATTTAC 
TTGCACTTGA 
TTGCACTTGA 
CGCTCTTTGA 
TGGTATTTGA 



AGAGAGTTTT 
TGAAAGTTTT 
AGATCTTGAC 
AGATATTGAT 
GGAAATAGAT 
AGAATTTGAT 



TACCCAAGTA 
TATCCCAGTG 
GGAGCTAATT 
GGCGCCAATT 
CCTAATCATT 
CCTAATCATT 



1650 
TAGTTATGAA 
TGGTAATGAA 
GGCTTATTGT 
GGCTCATTAT 
GGAGTATTGT 
GGAGTGTTGT 



1651 

TATTAGTTGG 
TATTAGTTGG 
TTTTATTTTT 
TTTTGTCTTT 
ATTATTATTC 
ATTGTTATTT 



ATGGTTTTTG 
CTAGTTTTTG 
ACAGTGTTAG 
ACAGTGTTGG 
TCAACTTTTG 

ACTACATTAG 




GGAAAATATT 
GTAAAATATT 
GAATTTTAGA 
GAATTTTAGA 
GTATAGTGGG 
GTATAGTAGG 



TTGTGGGGGT 
TTGTGATGGT 
AAATAAGGAT 
AAATAAGGAT 
AAGGGCTAA. 
GAGAG.GGA. 



1700 
GTAGATGATT 
ATCGAACCTA 
TTTTATAGTC 
TTCTATAGTC 



1701 1750 

Serotype IV TACAAC GAGAGTT CACTTGGACG GCAAATAAAA ATTAGTGTAA 

Serotype V TAAAAA AGGAATT TACT, . .ATT GTGAATAATA TATGACATAT 

Serotype la AACTTAAAAG GTGGAAAAGT TAATGGAAAA ACGAATACTT GTTTCTATCA 

Serotype lb AACTTAAAAG GTGGGAAAGT TAATGGAAAA ACAAATACTT GTTTCTATCG 

Serotype III AAAAT GAAAGAAAAA GTAACAGTCA 

Serotype VI ATGAT AAAAAAACTA GTTAGTGTGA 

Consensus . 
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1751 

Serotype IV TTGTACCAGT ATATAATTCG 
Serotype V TTGCTCTGAT ATGGCAGGAG 
Serotype la v TTATACCTAT ATACAACTCA 
Serotype lb TTATACCTAT ATACAACTCG 
Serotype III TTATACCTAT ATACAACTCA 
Serotype VI TTGTTCCAGT TTATAATTCG 
Consensus ** * * -* * 

1801 

Serotype IV ATTAGAAAAC AAACATATAA 

Serotype V ACATTATTGT TGGTTTGGAG 

Serotype la GTACTACAAC AGACTCATCC 

Serotype lb GTCCTACAAC AGACTCATTC 

Serotype III GTACTACAAC AGACTCATCC 

Serotype VI TTGCTTCAAC AAACATACCC 

Consensus ; — 



1800 

AAACAATATT TAATAGCTTG CGTTGATTCA 
GTAAGGAAGG AAAATGATAC CTAAAGTTAT 
GAAGCATACC TTAAAGAATG TGTGCAATCC 
GAAGCATATC TTAAAGAATG CGTGCAATCC 
GAAGCATACC TTAAAGAATG TGTGCAATCC 
GAGTTAGTGA TTGAGAACTG TGTAGAATCT 



cpsl/M 

1850 

GAATTTGGAA ATTATTCTTG TTAATGATGG 
GAAATCCCTT ACCAGATAAT TTAAAGAAAT 
ATTGATAGAA GTTATACTAA TTGATGATGG 
ATTGATAGAA GTTATACTGA TTAATGATGG 
ATTGATAGAA GTTATACTAA TTGATGATGG 
AGAAATAGAA ATTTTATTAA TAGATGATGG 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



1851 

ATCAACAGAT 

ATATAAAAA. 

ATCCACTGAT 

ATCCACTGAT 

ATCCACTGAT 

ATCTACAGAT 
** — * *_ 



GGTAGTAAAG 
. . . CTTGGAG 
AATAGTGGAG 
AATAGTGGAG 
AATAGTGGAG 
AAAAGTAGTC 



AGTTATGTGA 
AGAACAATGT 
AAATTTGTGA 
AAATTTGTGA 
AAATTTGTGA 
ATATTTGTAA 



GGAGATAAGA 
CCGGATTATG 
TAATTTATCT 
TAATTTATCT 
TAATTTATCT 
TAATTTTTTA 



1900 
AAATCAGATG 
AAATTATTGA 
CAAGAAGATA 
CAAAAAGACG 
CAGGAAGATA 
AAAAGGGATA 



1901 1950 

Serotype IV AAAGAATTAA GACATTTCAC AAAACAAATG GAGGACAATC AAGCGCAAGG 

Serotype V ATGGAATGAG CATAATTATG ATGTTAGTAA AAATGTTTTT ATGAGAGAAG 

Serotype la ATCGCATACT TGTATTTCAT AAAAAAAATG GAGGGGTCTC TTCGGCAAGG 

Serotype lb ATCGCATACT TGTATTTCAT AAAAAAAATG GAGGGGTATC TTCGGCAAGG 

Serotype III ATCGCATACT TGTATTTCAT AAAAAAAATG GAGGGGTCTC TTCGGCAAGG 

Serotype VI GTCGCGTAAA AGTCTATCAT AAATACAATG GAGGTGCATC ATCAGCAAGA 

Consensus * — * * + -* *- * — * 

1951 2000 

Serotype IV AATTTAGGTA TTTTATACTC . TACAGGAGAT TTGATTGGTT TTGTTGACAG 

Serotype V CATATACTAA GAAGAATTT • TGCT TATGTTTCTG ACTATGCAAG 

Serotype la AACCTAGGTC TAGATAAATC CACAGGAGAA TTCATAACAT TTGTGGATAG 

Serotype lb AACCTAGGTC TTGATAAATC CACAGGCGAA TTCATAACGT TTGTAGATAG 

Serotype III AACCTAGGTC TAGATAAATC CACAGGAGAA TTCATAACAT TTGTGGATAG 

Serotype VI AATGTGGGAC TTGAGATGGC AGAAGGTGAA TTTATAACTT TTGTAGATAG 

Consensus -* — *- ; * — * * * — +* 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
• Serotype VI 
Consensus 



2001 

CGACGATACA 
ATTGGATATT 
TGATGATTTT 
TGATGATTTT 
TGATGATTTT 
CGATGATGTT 
★** 



ATTGACCCTA 
ATTTATACTT 
GTAGCACCGA 
GTAGCACCGA 
GTAGCACCGA 
GTCGCACTAA 
_* 



AAATGTATGA 
ATGGGGGGTT 
ATATGATTGA 
ATATAATTGA 
ATATGATTGA 
ATATGATTGA 



AACGTTACTA 
CTATCTAGAT 
AATAATGTTA 
AATAATGTTA 
AATAATGTTA 
AATTATGCTG 
* 



2050 
AATATATATG 
ACTGATGTGG 
AAAAATTTAA 
AAAAATTTAA 
AAAAATTTAA 
AATAATTTGT 
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Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



2051 

AAGATGAACA 
AGCTTTTAAA 
TCACTGAGAA 
TCACTGAGGA 
TCACTGAGAA 
TAACGGAGAA 

2101 

AACGGTGTTA 
AGGGAGATTA 
GAGAGAGATT 
GAGAGAGATT 
GAGAGAGATT 
GA. . . TTTTT 



AGTAGACTGG 
AAGTTTAGAT 
TGCTGATATA 
TGCTGATATA 
TGCTGATATA 
CGCAGATATA 



GTGCAATGTA 
CCTTTGAGGA 
GCAGAAGTAG 
GCAGAAGTAG 
GCAGAAGTAG 
TCAGAAATTG 



ATCACAAAAA 
TTCATGAGTG 
ATTTTGA. . . 
ATTTTGA. . . 
ATTTTGA. . . 
ATTTCGA. . . 
_* * 



2100 
AATTTACTCT 
TTTTCTAGCA 
TATTTCGAAT 
TATTTCGAAT 
TATTTCGAAT 
AGTTTCAGAT 
** 



ACTTATATTA 
GTTGTGATGT 
ATAGAAAGAA 
ATAGAAAGAA 
ATAGAAAGAA 
ATAAAAGAAA 



TAATGGACCT 
GAATACAGGA 
GAAAAGACGA 
AAAAAGACGA 
GAAAAGACGA 
AAAAAGAAAA 



GAATACTATA 
TTAATAATTG 
AACTTTTATA 
AACTTTTATA 
AACTTTTATA 
GGTTACTATA 



2150 
ATGTGCTTAA 
GCGCTGTTAA 
AAGTCTTTAA 
AGGTCTTTAA 
AAGTTTTTAA 
GAGTTTTTCA 
* *+_* 



2151 

TAAACAAGAT 
AGGACATCAC 
AAACAATAAC 
AAACAATAAT 
GAATAATAAC 
AAACAATAAG 



TTCCTATACG 
TTTTTAAAAT 
TCTTTAAAAG 
TCTTTAAAAG 
TCTTTGAAAG 
TCTCTCAAAG 



AATTTCTGAG 
CAAATATGTC 
AATTTTTATC 
AATTTTTATC 
AATTTTTATC 
AATTTTTTTC 



TACAAATAAG 
TATATATGAC 
AGGCAATAGA 
AGGTAATAGA 
AGGTAATAGA 
AGGAAATAAA 
** 



2200 
ATTTTTAGTT 
AAAAGTGATT 
GTGGAAAATA 
GTGGAAAATA 
GTGGAAAATA 
GTAGAAAATG 



2201 

CAGTCTGCGA 
TAACTTCTCT 
TTGTTTGTAC 
TTGTTTGTAC 
TTGTTTGTAC 
TTGTTTGGGG 



GGGGTTGTTA 
TAATAAGACA 
AAAATTATAT 
AAAATTATAT 
AAAATTATAT 
GAAATTATAT 



TCTAGAGATT 
TGTGTAGAGG 
AAAAAAAGTA 
AAAAAAAGTA 
AAAAAAAGTA 
AAAAAAAGCA 



TAGCTTTAAA 
TTACAACTAA 
TAATTGGCAA 
TAATTGGTAA 
TAATTGGTAA 
TTATTGGGGA 
* * 



2250 
AATAAAATTC 
TTTATTGATA 
CTTGAGGTTT 
CTTGAGGTTT 
CTTGAGGTTT 
TTTACGATTT 



2251 

CGTGAAGAAA 
AACAGAGGGC 
GATGAGAACT 
GATGAGAATT 
GATGAGAACT 
AATGAAAAAT 



AAAAAT. . .A 
TTAAGA. . .A 
TAAAAATTGG 
TAAAAATTGG 
TAAAAATTGG 
ACAAAATTGG 



TGAAGATACA 
TAAGAATATT 
TGAGGATTTA 
TGAGGATTTA 
TGAGGATTTA 
TGAAGACTTG 



CAGTTTTATT 
ATTCAAAAGA 
CTTTTTAATT 
CTTTTTAATT 
CTTTTTAATT 
CTATTTAACT 



2300 
TTGATCTCAT 
TTGA. . TGAT 
GTAAACTCTT 
GTAAAATTTT 
GCAAACTCTT 
TTCAGATTTT 



2301 

AAAAAATGCT 
ATAACAATAT 
ATGTCAAGAG 
ATGTCAAGAG 
ATGTCAAGAG 
AAATAAAGAA 



AATAAGTTTG 
ATCCGAGAAA 
CACCGTATAG 
CACTGCATAG 
CACCGTATAG 
CATCGTATAG 



TTATTATAAG 
TTATTTTAAT 
TCGTAGATAC 
TCGTAGATAC 
TCGTAGATAC 
TTGTAGATAC 



CCAACCTTTT 
CCAAAGAATT 
GACTTCTTCC 
GACTTCTTCC 
GACTTCTTCC 
TAGAAGATCA 



2350 
TATAATTACT 
TATTAACA. . 
TTATATACTT 
TTGTACACCT 
TTATATACTT 
CTCTATACTT 
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Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 



Serotype IV 
Serotype V 
Serotype la 
Serotype lb 
Serotype III 
Serotype VI 
Consensus 
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2351 2400 
ACTACAGAAA AAATAGTACA ACAACTTCCT CATATAGTAG CTATCAATGG 
• . GGTAAGGT TGATTGTCTG ACTAGTGTTA CCTATTCTAT ACATCATTAC 
ATCGAATTGT AAAAACTTCC GCAA.TGAAT CAGAAATTCA ACGAAAACTC 
ATCGCATCGT AAAGACTTCT GCAA.TGAAT CAGGAGTTCA ACGAAAATTC 
ATCGAATTGT AAAAACTTCT GTAA.TGAAT CAGAAATTCA ACGAAAACTC 
ATCGTATTGA AGAAAAATCT ATAA.TGAAT CAACAATTTA ATAAAAATAC 
* — * + - m + * 

2401 2450 
GACATAATCG ATATCTGTAC TGAGTGTTAT TATTATGCAA AGGATTTTAA 
GAAGGAAGTT GGAAAAGTTC TTCATTTATT TCAGATTCTC TAAAGATTAG 
ATTAGATTTT ATAACAATTT TTAATGAAGT AAGTAGTTTG GTTCCTGCCA 
ATTAGATTTT ATAACAATTT TTAATGAAAT AAGCAGTATT GTTCCTGCAA 
ATTAGATTTT ATAACAATTT TTAATGAAAT AAGTAGTTTG GTTCCTGCCA 
ATTAGACTTC ATTGATATTT TTAATGAGAT TCATCAGGAT AGTCCGACAG 



2451 2500 
TGGATTTGAA GAAGTTGCTT TTTCAAGATT ATTTGGTGCA TATTCGTTAG 
AGTAAGGCTC ATAATTGATT TTTTATTTGG ATATGGTACT TATAGAATGC 
AATTGGCTAA TTATGTTGAA GCGAAATTTT TAAGAGAAAA GATAAAGTGT 
AATTAGCTAA TTATGTTGAA GCGAAATTTT TAAGAGAAAA GGTAAAGTGT 
GATTAGCTAA TTATGTTGAA GCGAAATTTT TAAGAGAAAA GATAAAGTGT 
AATTGTTTAA TTATGTGGAA GCGAAGTTTG TACGAGAAAA AATCAAGTGT 
— * — * * * . + + 

2501 2550 
TAGCTAATAA AATTGTATAT AATAAAGATT ATAGAAAAAC CGAAGAATTA 

TTCTAAGGTT TCTAAAGTTA AAGAAAT AG . 

CTCCGAAAAA TGTTTGAATT AGGTAGTAAT ATTGACAATA AAATCAAAGT 
CTCCGAAAAA TGTTTGAATT AGGTAGTAAT ATTGACAGTA AAAT CAAATT 
CTCCGAAAAA TGTTTGAATT AGGTAGTAAT ATTGACAATA AAATCAAAGT 
TTAAGGAAAA TGTTTGAATT AGGAGAAATA GCTGATGAAA ATTTACGTTT 
*- + + 

2551 2600 
AGATAA 

ACAACGAGAG ATTTTTTTCA AAGACATTAA ATCATACCCG TTCTATAAAG 
ACAACGAGAG ATTTTTTTCA AAGATGTTAA ATTATACCCT TTCTATAAAG 
ACAACGAGAG ATTTTTTTCA AAGACATTAA ATCATACCCG TTCTATAAAG 
ACAGAGATAT AAATTTTGGC AAGATATTAA AT CATATTCA ATATGCAAAG 



2601 2650 



CGGTAAAATA CTTATCATTA AAGGGATTAT TAAGCTTTTA TTTAATGAAA 
CGGTTAAGTA CTTATCATTA AAGGGATTAT TGAGTATTTA CTTAATGAAA 
CGGTCAAATA CTTATCATTA AAGGGATTAT TAAGCTTTTA TTTAATGAAA 
CAATAAGGTT CTTATCTAAA AAACATATCT GTACGTTATA TTTGATGAAA 
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2651 270 ° 
Serotype IV 

SerotSlTlI TGTOCACCTA AACTATATCT TATGGCATAT AGAAGATTCA AAACAGTAGC 

Serotype lb TGTTCACCCA TCTTGTATAT AAAATTATAT GACAGGTTTC AAAAACAGTA 

Serotype III TGTTCACCTA AACTATATGT TATGGCATAT AGAAGATTTC AAAAACAGTA 

Serotype VI TATTTTCCGT ACGTATATAT AAAGATGTAT AATAAATTTC AAAAGCAATA 

Consensus 

2701 2728 

Serotype IV 

Serotype V 

Serotype la TGGAGAAATT GGGAAAGAGA ATTTATAA 

Serotype lb A 

Serotype III G 

Serotype VI A • 

Consensus — 



Notes. 

Serotype la: GenBank accession number AB028896; 
Serotype lb: GenBank accession number AB050723; 
Serotype IE: GenBank accession number AF163833; 
Serotype IV: GenBank accession number AF3 55776; 
Serotype V: GenBank accession number AF349539; 
Serotype VI: GenBank accession number AF337958. 
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Figure 4. Two sites (*) of sequence heterogeneity between alp2 (AF208158, upper lines) and 
alp3 (AF291065, lower lines) used to distinguish them (relevant primers are shown). 

251 AAGGTAATCTTAATATTTTTGAAGAGTCAATAGTTGCTGCATCTACAATT 300 

I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 1 I I I 
531 A AGGTAATCTTAATATTTTTGAAGAGTCAATAGTTGCTGCATC TACAATT 580 

• bcaSl .'• • • 
301 CCAGGGAGTGCAGCGACCTTAAATACAAGCATCACTAAAAATATACAAAA 350 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

581 CCAGGGAGTGCAGCGACCTTAAATACAAGCATC ACTAAAAATATACAAAA 630 

bcaS2 . • 

351 CGGAAACGCTTACATAGATTTATATGATGTAAAGAATGGATTGATTGATC 400 

llllll + lllllllllllllll II M I I I I II I I I I II I I I II I Ml I M 
631 CGGAAATGCTTACATAGATTTATATGATGTAAAGAATGGATTGAT CGATC 680 

• • • • • 

401 CTCAAAACCTCATTGTATTAAATCCATCAAGCTATTCAGCAAATTATTAT 450 

I I iiiii 1 1 1 1 ii 1 1 ii ii 1 1 mi 1 1 1 1 mi i iii iii iii ii 1 1 1 1 1 

681 CTCAAAACCTCATTGTATTAAATCCATCAAGCTATTCA GCAAATTATTAT 730 

• balS • • • 

451 ATCAAACAAGGTGCTAAATATTATAGTAATCCGAGTGAAATTACAACAAC 500 

I MM II II 1 1 II 1 1 II 1 1 1 1 1 II I II I IIIII 1 1 MM 1 1 II II 1 1 II I 

731 AT CAAACAAGGTGCTAAATATTATAGTAATCCGAGT GAAATTACAACAAC 780 

• • * • • 
501 TGGTTCAGCAACTATTACTTTTAATATACTTGATGAAACTGGAAATCCAC 550 

I Ml I III 1 1 1 III 1 1 ill II I MM 1 1 1 1 II 1 1 1 1 1 1 1 II II 1 1 1 1 1 1 1 

781 TGGTTCAGCAACTATTACTTTTAATATACTTGATGAAACTGGAAATCCAC 830 

551 ATAAAAAAGCTGATGGACAAATTGATATAGTTAGTGTGAATTTAACTATA 600 

IIIII II II I I i I I II I I I II I I I 1 I I I I I I I I I 1 I I I I I I I I I I I I I I I 
. 831 ATAAAAAAGCTGATGGACAAATTGATATAGTTAGTGTGAATTTAACTATA 880 

a • • • • 

601 TATGATTCTACAGCTTTAAGAAATAGGATAGATGAAGTAATAAATAATGC 650 

MM MM II MM Mill II I III I II II Ml I III II MM M I II II 

881 TATGATTCTACAGCTTTAAGAAATAGGATAGAT GAAGTAATAAATAAT GC 930 

• • • • • 

651 AAATGATCCTAAGTGGAGTGATGGGAGTCGTGATGAAGTCTTAACTGGAT 700 

IMMII II I I I I i I I I I I IN MM II I Ml II : III I I Ml Mill II 

931 AAATGAT CCTAAGT GG AGTGATGGGAGTCGTGATGAAGT CTTAACT GGA T 980 

balA 
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Figure 5 



Genetic markers of GBS genotypes Strain No. (%) Genotypes 



V-a]p3-IS73*i-ISSa4 
V-alp3-IS73M-ISo , tf7-ISSa4 
V-alp3-IS13W 
V*>alp3 

V-alp3-IS73S7-lS$5i-GBSil 
Ib-alp3-IS73*7 

V- None-GBSil 
III-aIp3-IS13W-GBSa 
n-alp3-IS73&MS/5<tf-GBSil 
U-UI&-1S1381-ISJ548 

11-9\&-lS1381JS861 

n-a-ISJJM-ISSa*GBSa 

VI- as-I&73M 
VI-a-IS/3W 
TV-S-1S1381 
TV-*XS1381-1S861 
Ia-as-ISJ3M 
k-a-IS23M-IS557 
Ia-a-IS73M 
Ia-None-IS23M 
Ia-a-ISJ3M-ISSa4 
Ia-AaB5a-IS23M-ISoW 
Ia-a-ISStf/-GBSil 



1(03) 

27+1(14) 

1(0^) 

1(03) 

1(03) 

1(03) 

1(05) 

U0.5) 

1(03) 

1(0.5) 



n-Aa-ISJ3&MS£67-ISSa4 
II-AaB9-lSi3«-IS*67-ISSa4 
n-AaB4-IS73mSM7-ISSa4 
n-AaBl-ISJ3W-IS«67-IS/^ 
II-Aa-IS7357-ISS67-IS/545-GBSil 
H-AaB4-IS73&MS*o7-IS754o' 
YI-AaB3b-IS13fi2-lS*67 ~ 
VI-Aa-IS23W-ISM7-GBSfl 
VI-AaB7a-IS23*2-ISfftfJ 
It>-AaB3c4SJ357 
Ib-AaB7-IS23*2 
B>-AaB10-ISJ3SMSS67 
Ib-AaB10-ISi3«-IS56i-ISSa^ 
B)-AaBl-ISJ3«i-lS6^1 
Ib-AaBl-ISi3M 
Ib-AaB3-ISJ35i-IS«W 

B>-AaB4b-IS73M-lS$d7 

Jb-AaB%-lS138M$861 

Ib-AaB4-IS73S7-IS*67 

Ib-AaBla-IS23S2-lS56*7 

Ib-AaB2-IS73«7-IS*o"7 

Ib-AaBNl-IS73M-IS*6*7 

It>-AaBN2-IS73&MSS67 

Ib-AaB9a-IS73S2-ISS67 

ffl^AaB2-IS73g7-ISg6/-U^SU 

IH-R-ISS67-UBSU ~~ 

n-R-ISJ3Si-lS««-GBSil 

iy-R-IS7337-lS*6 < 7-GBSfl 
" in-R-IS73*2-IStfo7-lSi J«d 

VJKB3*rlS13811S861 

UI-aIp2as 

Ia-alp2as 

n~alp2as>GBSfl 



Total»56 genotypes 




1(03) 
1 (03) 

Total=177+l7" 
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